๐Ÿ“Š Model Evaluation โ€“ Classification & Regression

When building ML models, itโ€™s not enough to just train themโ€”you also
need to evaluate how good they are. Different problems (classification
vs regression) use different metrics. Letโ€™s break it down.


๐Ÿ”น Binary Classification Example โ€“ Confusion Matrix

A confusion matrix compares actual labels (truth) with the modelโ€™s
predictions.

  • True Positive (TP): predicted positive, actually positive\
  • False Positive (FP): predicted positive, actually negative\
  • True Negative (TN): predicted negative, actually negative\
  • False Negative (FN): predicted negative, actually positive

๐Ÿ‘‰ Goal: maximize TP and TN, minimize FP and FN.

Key Metrics

  • Precision = TP / (TP + FP)
    โ€œOf all predicted positives, how many were actually positive?โ€
    Best when false positives are costly (e.g., diagnosing a healthy
    person as sick).

  • Recall = TP / (TP + FN)
    โ€œOf all actual positives, how many did we correctly identify?โ€
    Best when false negatives are costly (e.g., missing a cancer
    diagnosis).

  • F1 Score = 2 ร— (Precision ร— Recall) / (Precision + Recall)
    Harmonic mean of precision and recall.
    Best for imbalanced datasets where accuracy alone is misleading.

  • Accuracy = (TP + TN) / (All predictions)
    Useful only for balanced datasets.
    Example: If 95% of emails are โ€œnot spam,โ€ a model that always
    predicts โ€œnot spamโ€ has 95% accuracy but is useless.


๐Ÿ”น AUC-ROC (Area Under the Curve โ€“ Receiver Operator Curve)

  • Plots True Positive Rate (Sensitivity/Recall) vs False
    Positive Rate (1 - Specificity)
    at various thresholds.\
  • AUC value ranges from 0 to 1.
    • 1.0 = perfect model\
    • 0.5 = random guessing

๐Ÿ‘‰ Business use case: choose a threshold that balances precision and
recall for your needs (fraud detection, medical tests, etc.).
๐Ÿ“Œ Exam Tip: Remember AUC-ROC helps compare multiple models and find
the best threshold.


๐Ÿ”น Regression Model Metrics

For regression (continuous predictions, e.g., house prices, stock
values), we measure errors:

  • MAE (Mean Absolute Error): average absolute difference between
    prediction and truth.
    โ†’ Easy to interpret: โ€œOn average, the model is off by X units.โ€

  • MAPE (Mean Absolute Percentage Error): average error as a
    percentage.
    โ†’ Useful when scale of values matters (e.g., sales forecasts).

  • RMSE (Root Mean Squared Error): penalizes large errors more
    heavily than MAE.
    โ†’ Common when big mistakes are unacceptable.

  • Rยฒ (Coefficient of Determination): measures how much variance in
    the target is explained by the model.

    • Rยฒ = 0.8 โ†’ 80% of variation is explained by features, 20% by
      noise/other factors.\
    • Rยฒ close to 1 = strong model.


๐Ÿ”น Example (Regression Metrics in Action)

You predict student test scores based on study hours:

  • RMSE = 5 โ†’ model predictions are ~5 points off on average.\
  • Rยฒ = 0.8 โ†’ 80% of score differences explained by study hours,
    20% due to natural ability or luck.

โœ… Key Takeaways (Exam Perspective)

  • Classification models โ†’ Confusion Matrix, Precision, Recall, F1,
    Accuracy, AUC-ROC
    \
  • Regression models โ†’ MAE, MAPE, RMSE, Rยฒ\
  • Choose metrics based on business need:
    • Precision for costly false positives\
    • Recall for costly false negatives\
    • F1 for imbalanced data\
    • Accuracy only for balanced datasets