Tree Ensembles

Decision trees are intuitive and easy to interpret, but they tend to overfit—performing well on training data but poorly on unseen data. Tree ensembles solve this problem by combining multiple trees.

Ensemble Learning

Ensemble learning combines predictions from multiple models to produce better results than any single model. Just as a group of people can make better decisions together, a group of models can produce more accurate and stable predictions.

Every model has error from its limitations and from randomness in training data. Training multiple models that make slightly different mistakes allows those errors to cancel out. This is the core idea behind ensemble learning—using diversity among models to improve performance.

Bagging (Bootstrap Aggregation)

Bagging is the most common ensemble technique for decision trees. Here's how it works:

Bootstrap sampling: Create multiple datasets by randomly sampling from the original data with replacement (some examples appear multiple times, others not at all)
Train models: Train a decision tree on each dataset
Aggregate predictions:
- Classification: Majority vote across all trees
- Regression: Average of all predictions

Why bagging works:

Decision trees are high-variance models—small data changes create very different trees
Averaging many trees trained on slightly different data reduces this instability
The result is a smoother, more consistent model

Bagging works best with unstable models like decision trees. Stable models like linear regression don't benefit much because their predictions barely change between samples.

Out-of-Bag (OOB) Samples

Because bootstrap sampling leaves out about one-third of data points for each tree, these unused points serve as a built-in validation set. This is called out-of-bag evaluation.

Advantages:

No separate validation set needed
Efficient use of all data
Free performance estimate during training

Random Forest

Random Forest extends bagging by adding feature randomness. Instead of considering all features at each split, each tree only looks at a random subset of features. This forces trees to be more diverse, further reducing variance.

For a comprehensive guide on Random Forest, including feature randomness, hyperparameters, tuning strategies, and feature importance, see Random Forest.

Other Tree Ensemble Methods

Beyond Random Forest, other popular tree ensemble methods include:

Gradient Boosting: Sequentially trains trees that correct previous errors (XGBoost, LightGBM, CatBoost)
AdaBoost: Adapts by focusing on misclassified examples
Extra Trees: Uses random splits instead of optimal splits for even more diversity

Each method balances bias and variance differently, with different strengths for various problems.