Tree Ensembles
Combining multiple decision trees to improve accuracy and stability
Decision trees are intuitive and easy to interpret, but they tend to overfit—performing well on training data but poorly on unseen data. Tree ensembles solve this problem by combining multiple trees.
Ensemble Learning
Ensemble learning combines predictions from multiple models to produce better results than any single model. Just as a group of people can make better decisions together, a group of models can produce more accurate and stable predictions.
Every model has error from its limitations and from randomness in training data. Training multiple models that make slightly different mistakes allows those errors to cancel out. This is the core idea behind ensemble learning—using diversity among models to improve performance.
Bagging (Bootstrap Aggregation)
Bagging is the most common ensemble technique for decision trees. Here's how it works:
- Bootstrap sampling: Create multiple datasets by randomly sampling from the original data with replacement (some examples appear multiple times, others not at all)
- Train models: Train a decision tree on each dataset
- Aggregate predictions:
- Classification: Majority vote across all trees
- Regression: Average of all predictions
Why bagging works:
- Decision trees are high-variance models—small data changes create very different trees
- Averaging many trees trained on slightly different data reduces this instability
- The result is a smoother, more consistent model
Bagging works best with unstable models like decision trees. Stable models like linear regression don't benefit much because their predictions barely change between samples.
Out-of-Bag (OOB) Samples
Because bootstrap sampling leaves out about one-third of data points for each tree, these unused points serve as a built-in validation set. This is called out-of-bag evaluation.
Advantages:
- No separate validation set needed
- Efficient use of all data
- Free performance estimate during training
Random Forest
Random Forest extends bagging by adding feature randomness. Instead of considering all features at each split, each tree only looks at a random subset of features. This forces trees to be more diverse, further reducing variance.
For a comprehensive guide on Random Forest, including feature randomness, hyperparameters, tuning strategies, and feature importance, see Random Forest.
Other Tree Ensemble Methods
Beyond Random Forest, other popular tree ensemble methods include:
- Gradient Boosting: Sequentially trains trees that correct previous errors (XGBoost, LightGBM, CatBoost)
- AdaBoost: Adapts by focusing on misclassified examples
- Extra Trees: Uses random splits instead of optimal splits for even more diversity
Each method balances bias and variance differently, with different strengths for various problems.