Dokumentation (english)

Regression

Predicting continuous values from structured features

Regression tasks predict continuous numerical values from input data. Common examples include house price prediction, temperature forecasting, or sales estimation.

Common Regression Models

  • Linear Regression: Models the relationship between features and target as a linear combination. Fast, interpretable, assumes linear relationships and normally distributed errors.

  • Polynomial Regression: Extends linear regression by creating polynomial features. Captures non-linear relationships while maintaining the linear model structure.

See Model Families for more information on Linear Models, Decision Trees, Support Vector Machines, and Tree Ensembles.

Evaluation Metrics

When predicting continuous values — like house prices or customer spending — the metrics measure error magnitudes and fit quality.

Mean Squared Error (MSE)

Measures the average squared difference between actual and predicted values. Squaring ensures larger errors are penalized more heavily than smaller ones.

MSE=1ni=1n(yiy^i)2\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

MSE connects directly to the optimization objective in ordinary least squares regression. However, it is sensitive to outliers.

This visualization shows regression model predictions (line) against actual values (dots). The vertical distance between each point and the line represents the prediction error. MSE squares these distances, MAE uses their absolute values, and RMSE is the square root of MSE.

Root Mean Squared Error (RMSE)

The square root of MSE, providing error measurements in the same units as the target variable.

RMSE=1ni=1n(yiy^i)2\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}

RMSE is more interpretable than MSE. For example, if predicting house prices in dollars, RMSE will also be in dollars. Like MSE, it emphasizes large errors.

Mean Absolute Error (MAE)

Takes the absolute value of each error instead of squaring.

MAE=1ni=1nyiy^i\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|

MAE treats all errors equally, regardless of size. This makes it more robust to outliers compared to MSE or RMSE. If your dataset has extreme values that are hard to predict, MAE gives a fairer picture of performance.

This visualization compares how MSE and MAE handle prediction errors. MSE squares errors, heavily penalizing large deviations (the curve grows quadratically). MAE treats all errors linearly, making it more robust to outliers. Choose MSE when large errors are unacceptable, and MAE when you want equal treatment of all errors.

R² Score (Coefficient of Determination)

Measures the proportion of variance in the target variable explained by the model.

R2=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}

The denominator represents total variance in the data (differences between actual values and their mean), and the numerator is unexplained variance (residual sum of squares). An $R^2$ of 1 means perfect predictions, while 0 means the model does no better than predicting the mean.

Adjusted R²

Addresses the limitation that R² always increases when adding more features, even if those features aren't helpful.

Radj2=1(1R2)(n1)np1R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - p - 1}

where $n$ is the number of observations and $p$ is the number of predictors. This penalizes addition of irrelevant variables and helps compare models with different numbers of features.

Choosing the Right Metric

No single metric is universally best:

  • MSE and RMSE: Preferred when penalizing large errors is important.
  • MAE: Useful when robustness to outliers matters.
  • R² and Adjusted R²: Helpful for explaining variance captured by the model and comparing model complexity.

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor etwa 13 Stunden
Release: v4.0.0-production
Buildnummer: master@27db988
Historie: 34 Items