Random Forest
Ensemble of decision trees that averages predictions
Ensemble of decision trees that averages predictions. Each tree sees a random subset of data and features.
When to use:
- Robust baseline - works well on most problems
- Handles non-linear relationships naturally
- Can handle missing values
- Feature importance needed
- Resistant to overfitting
Strengths: Very accurate, handles non-linearity, robust to noise and outliers, provides feature importance Weaknesses: Can be slow, large model size, less interpretable than linear models
Model Parameters
N Estimators (default: 100) Number of trees in the forest. More trees = better but slower.
- 50-100: Fast training
- 100-300: Good default
- 500+: Maximum accuracy, slower
Max Depth Maximum tree depth. Controls model complexity.
- None: Trees grow until pure (may overfit)
- Low (3-10): Simple, prevents overfitting
- High (20-50): Complex patterns, may overfit
Min Samples Split (default: 2) Minimum samples needed to split a node. Higher values prevent overfitting.
Min Samples Leaf (default: 1) Minimum samples in a leaf node. Higher values create smoother predictions.
Max Features Features to consider at each split:
- sqrt: Square root of total features
- log2: Log2 of total features (default for regression)
- None: Use all features
Bootstrap (default: true) Whether to use bootstrap sampling. Keep true for better generalization.
Random State (default: 42) Seed for reproducibility.