AWS Certified AI Practitioner(27) - Model Evaluation - Classification & Regression
๐ Model Evaluation โ Classification & Regression
When building ML models, itโs not enough to just train themโyou also
need to evaluate how good they are. Different problems (classification
vs regression) use different metrics. Letโs break it down.
๐น Binary Classification Example โ Confusion Matrix
A confusion matrix compares actual labels (truth) with the modelโs
predictions.
- True Positive (TP): predicted positive, actually positive\
- False Positive (FP): predicted positive, actually negative\
- True Negative (TN): predicted negative, actually negative\
- False Negative (FN): predicted negative, actually positive
๐ Goal: maximize TP and TN, minimize FP and FN.
Key Metrics
Precision = TP / (TP + FP)
โOf all predicted positives, how many were actually positive?โ
Best when false positives are costly (e.g., diagnosing a healthy
person as sick).Recall = TP / (TP + FN)
โOf all actual positives, how many did we correctly identify?โ
Best when false negatives are costly (e.g., missing a cancer
diagnosis).F1 Score = 2 ร (Precision ร Recall) / (Precision + Recall)
Harmonic mean of precision and recall.
Best for imbalanced datasets where accuracy alone is misleading.Accuracy = (TP + TN) / (All predictions)
Useful only for balanced datasets.
Example: If 95% of emails are โnot spam,โ a model that always
predicts โnot spamโ has 95% accuracy but is useless.
๐น AUC-ROC (Area Under the Curve โ Receiver Operator Curve)
- Plots True Positive Rate (Sensitivity/Recall) vs False
Positive Rate (1 - Specificity) at various thresholds.\ - AUC value ranges from 0 to 1.
- 1.0 = perfect model\
- 0.5 = random guessing
๐ Business use case: choose a threshold that balances precision and
recall for your needs (fraud detection, medical tests, etc.).
๐ Exam Tip: Remember AUC-ROC helps compare multiple models and find
the best threshold.
๐น Regression Model Metrics
For regression (continuous predictions, e.g., house prices, stock
values), we measure errors:
MAE (Mean Absolute Error): average absolute difference between
prediction and truth.
โ Easy to interpret: โOn average, the model is off by X units.โMAPE (Mean Absolute Percentage Error): average error as a
percentage.
โ Useful when scale of values matters (e.g., sales forecasts).RMSE (Root Mean Squared Error): penalizes large errors more
heavily than MAE.
โ Common when big mistakes are unacceptable.Rยฒ (Coefficient of Determination): measures how much variance in
the target is explained by the model.- Rยฒ = 0.8 โ 80% of variation is explained by features, 20% by
noise/other factors.\ - Rยฒ close to 1 = strong model.
- Rยฒ = 0.8 โ 80% of variation is explained by features, 20% by
๐น Example (Regression Metrics in Action)
You predict student test scores based on study hours:
- RMSE = 5 โ model predictions are ~5 points off on average.\
- Rยฒ = 0.8 โ 80% of score differences explained by study hours,
20% due to natural ability or luck.
โ Key Takeaways (Exam Perspective)
- Classification models โ Confusion Matrix, Precision, Recall, F1,
Accuracy, AUC-ROC\ - Regression models โ MAE, MAPE, RMSE, Rยฒ\
- Choose metrics based on business need:
- Precision for costly false positives\
- Recall for costly false negatives\
- F1 for imbalanced data\
- Accuracy only for balanced datasets