Classification

Classification tasks predict discrete labels or categories from input data. Common examples include spam detection, disease diagnosis, or customer churn prediction.

Common Classification Models

Logistic Regression: Linear model that estimates class probabilities using the sigmoid function. Fast, interpretable, works well for linearly separable data.
Random Forest: Ensemble of decision trees with feature randomness. Robust, handles non-linear patterns, requires minimal tuning.
K-Nearest Neighbors (KNN): Classifies based on similarity to training examples. Simple, non-parametric, sensitive to feature scaling and dimensionality.

See Model Families for more information on Decision Trees, Linear Models, Support Vector Machines, and Tree Ensembles.

Evaluation Metrics

For classification tasks, the model predicts discrete labels such as "spam" or "not spam." Here are the most common metrics used to evaluate such models:

Accuracy

The simplest metric — the fraction of correctly classified samples out of all predictions.

\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}

While intuitive, accuracy can be misleading for imbalanced datasets, where one class dominates.

The confusion matrix visualizes classification performance by showing true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Each cell represents the count of predictions for a particular actual-predicted class combination.

Precision, Recall, and F1-Score

These metrics provide a deeper view of performance, especially when the cost of false positives and false negatives differs.

Precision (positive predictive value) measures how many predicted positives are actually correct.

\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}

Recall (sensitivity) measures how many actual positives the model correctly identifies.

\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}

F1-Score is the harmonic mean of precision and recall, balancing both in one metric.

\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}

This visualization compares precision, recall, and F1-score across different classification thresholds or classes, helping you understand the trade-offs between identifying all positive cases (recall) and ensuring predicted positives are correct (precision).

ROC Curve and AUC

For probabilistic classifiers, the ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) at different thresholds. The AUC score summarizes the model's ability to distinguish between classes — a perfect classifier has an AUC of 1.0, while a random one scores around 0.5.

The ROC curve shows how the true positive rate and false positive rate change as the classification threshold varies. A curve closer to the top-left corner indicates better performance. The diagonal line represents a random classifier.