XGBoost
Optimized gradient boosting with regularization and speed
XGBoost is an optimized implementation of gradient boosting with built-in L1/L2 regularization, parallel tree construction, and efficient handling of missing values. It is a top-performing model on structured data competitions.
When to use:
- Competitive accuracy on tabular classification tasks
- Datasets with missing values (handled natively)
- When regularization is needed to prevent overfitting on small-to-medium datasets
Input: Tabular data with the feature columns defined during training Output: Predicted class label and class probabilities
Model Settings (set during training, used at inference)
N Estimators (default: 100) Number of boosting rounds.
Max Depth (default: 6) Maximum tree depth. Values of 3–10 are common.
Learning Rate / ETA (default: 0.3) Step size shrinkage. Lower values improve generalization with more rounds.
Subsample (default: 1.0) Row sampling ratio per tree. Values of 0.5–0.9 add regularization.
Col Sample By Tree (default: 1.0) Feature sampling ratio per tree. Reduces correlation between trees.
Min Child Weight (default: 1) Minimum sum of instance weights in a leaf. Higher values create more conservative trees.
Gamma (default: 0) Minimum loss reduction to make a split. Higher values make trees more conservative.
Lambda (default: 1) L2 regularization term on leaf weights.
Alpha (default: 0) L1 regularization term on leaf weights.
Inference Settings
No dedicated inference-time settings. The trained XGBoost ensemble produces predictions.