Confusion Matrix

Overview

A confusion matrix is a performance measurement tool for machine learning classification models. It displays the number of correct and incorrect predictions broken down by each class, showing true positives, true negatives, false positives, and false negatives in a matrix format.

Best used for:

Evaluating classification model accuracy
Understanding which classes are confused with each other
Identifying bias in model predictions
Comparing performance across different models
Analyzing precision, recall, and F1-score by class
Detecting overfitting or underfitting patterns

Common Use Cases

Machine Learning & AI

Binary classification evaluation (spam/not spam, fraud/legitimate)
Multi-class classification assessment
Model comparison and selection
Hyperparameter tuning evaluation
Feature importance validation

Medical & Diagnostics

Disease detection accuracy
Test result validation (positive/negative)
Screening program effectiveness
Diagnostic tool comparison

Quality Control

Defect detection system evaluation
Automated inspection accuracy
Classification system validation
Process control monitoring

Understanding the Confusion Matrix

Binary Classification (2×2 Matrix)

                Predicted
                Negative  Positive
Actual Negative    TN        FP
       Positive    FN        TP

True Positive (TP): Correctly predicted positive
True Negative (TN): Correctly predicted negative
False Positive (FP): Incorrectly predicted positive (Type I error)
False Negative (FN): Incorrectly predicted negative (Type II error)

Multi-Class Classification (N×N Matrix)

Each cell shows how many times class i was predicted as class j.

Diagonal: Correct predictions
Off-diagonal: Misclassifications

Key Metrics Derived

Accuracy

(TP + TN) / Total

Overall correctness of the model.

Precision

TP / (TP + FP)

Of all positive predictions, how many were correct?

Recall (Sensitivity)

TP / (TP + FN)

Of all actual positives, how many did we catch?

Specificity

TN / (TN + FP)

Of all actual negatives, how many were correctly identified?

F1-Score

2 × (Precision × Recall) / (Precision + Recall)

Harmonic mean of precision and recall.

Settings

Normalize

Optional - Display values as proportions instead of counts.

When enabled, shows percentages or proportions instead of raw counts, making it easier to compare models trained on different dataset sizes.

Options:

Off: Show raw counts
On: Show normalized values (0-1 or percentages)

Annotate Cells

Optional - Display values in each cell.

Shows the numerical value (count or percentage) in each cell of the matrix.

Default: On

Tips for Interpreting Confusion Matrices

Focus on Off-Diagonal Values:
- High off-diagonal values indicate confusion between classes
- Look for systematic patterns in misclassification
- Consider class similarity when evaluating errors
Check Class Balance:
- Imbalanced datasets can have misleading accuracy
- Look at per-class metrics, not just overall accuracy
- Consider using normalization for imbalanced data
Understand Cost of Errors:
- False positives vs false negatives have different costs
- Medical: False negatives (missing disease) often worse
- Spam: False positives (blocking real email) often worse
- Adjust decision threshold based on cost
Use Normalization Wisely:
- Normalize by row (true class) to see recall per class
- Normalize by column (predicted class) to see precision
- Normalize by total to see overall distribution
Compare Multiple Models:
- Same confusion matrix format makes comparison easy
- Look for improvements in specific error types
- Consider which errors matter most for your application
Combine with Other Metrics:
- Confusion matrix shows details, but not the full picture
- Use with ROC curves, precision-recall curves
- Consider business metrics alongside statistical ones

Example Scenarios

Binary Classification (Fraud Detection)

High recall is critical-missing fraud is costly.

Multi-Class Classification (Product Categories)

Shows which product categories are commonly confused.

Normalized Confusion Matrix

Easier to compare when classes have different frequencies.

Medical Diagnosis

False negatives (missing disease) are more serious than false positives.

When to Use Different Metrics

Use Accuracy When:

Classes are balanced
All errors have equal cost
You need a simple single number

Use Precision When:

False positives are costly
You want confidence in positive predictions
Examples: spam detection, fraud detection

Use Recall When:

False negatives are costly
You want to catch all positives
Examples: disease screening, security threats

Use F1-Score When:

You need balance between precision and recall
Classes are imbalanced
You want a single metric better than accuracy

Troubleshooting

Issue: Model has high accuracy but performs poorly

Solution: Check if dataset is imbalanced. A model predicting all "negative" could have 95% accuracy if 95% of data is negative. Look at per-class metrics.

Issue: Can't see cell values clearly

Solution: Enable "Annotate Cells" setting. Consider using normalization if numbers are very large or very small.

Issue: Hard to compare models with different sample sizes

Solution: Enable "Normalize" to show proportions instead of raw counts. This makes models directly comparable.

Issue: Confusion between similar classes

Solution: This is normal when classes are similar (e.g., "cat" vs "dog"). Consider combining similar classes or improving features that distinguish them.

Issue: Perfect diagonal (all correct)

Solution: Might indicate overfitting, especially if validation performance is poor. Check if test data leaked into training.

Issue: Almost no true positives

Solution: Model might be biased toward negative class. Check class balance, try resampling, or adjust decision threshold.

Confusion Matrix

Overview

Common Use Cases

Machine Learning & AI

Medical & Diagnostics

Quality Control

Understanding the Confusion Matrix

Binary Classification (2×2 Matrix)

Multi-Class Classification (N×N Matrix)

Key Metrics Derived

Accuracy

Precision

Recall (Sensitivity)

Specificity

F1-Score

Settings

Normalize

Annotate Cells

Tips for Interpreting Confusion Matrices

Example Scenarios

Binary Classification (Fraud Detection)

Multi-Class Classification (Product Categories)

Normalized Confusion Matrix

Medical Diagnosis

When to Use Different Metrics

Use Accuracy When:

Use Precision When:

Use Recall When:

Use F1-Score When:

Troubleshooting

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Confusion Matrix

Overview

Common Use Cases

Machine Learning & AI

Medical & Diagnostics

Quality Control

Understanding the Confusion Matrix

Binary Classification (2×2 Matrix)

Multi-Class Classification (N×N Matrix)

Key Metrics Derived

Accuracy

Precision

Recall (Sensitivity)

Specificity

F1-Score

Settings

Normalize

Annotate Cells

Tips for Interpreting Confusion Matrices

Example Scenarios

Binary Classification (Fraud Detection)

Multi-Class Classification (Product Categories)

Normalized Confusion Matrix

Medical Diagnosis

When to Use Different Metrics

Use Accuracy When:

Use Precision When:

Use Recall When:

Use F1-Score When:

Troubleshooting

On this page

Command Palette