BIRCH
Balanced Iterative Reducing and Clustering using Hierarchies - memory-efficient hierarchical clustering for very large datasets
Balanced Iterative Reducing and Clustering using Hierarchies - memory-efficient hierarchical clustering for very large datasets.
When to use:
- Very large datasets that don't fit in memory
- Need fast online clustering
- Roughly spherical clusters
- Memory constraints
Strengths: Very fast, memory efficient, online learning, handles large datasets, incremental Weaknesses: Assumes spherical clusters, sensitive to threshold, order-dependent results
Model Parameters
N Clusters (default: 3, required) Number of clusters after final clustering step.
Threshold (default: 0.5) Radius of the subcluster obtained by merging a new sample. Key parameter.
- Low (0.1-0.3): Many small clusters, high memory
- Medium (0.5): Balanced
- High (1.0+): Few large clusters, low memory
Branching Factor (default: 50) Maximum number of subclusters in each node.
- 10-30: Deeper tree, slower
- 50: Good default
- 100+: Shallower tree, faster