Hierarchical Clustering
Builds a tree (dendrogram) of clusters by iteratively merging or splitting groups based on distance
Builds a tree (dendrogram) of clusters by iteratively merging or splitting groups based on distance.
When to use:
- Want to visualize cluster hierarchy with dendrogram
- Need clusters at multiple granularities
- Relatively small dataset (<10k samples)
- Want deterministic results
Strengths: Creates hierarchical structure, no need to specify k upfront, deterministic, visualizable with dendrogram Weaknesses: Slow on large datasets, sensitive to noise and outliers, cannot undo merges
Model Parameters
N Clusters (default: 2, required) Number of clusters to extract from the hierarchy.
Linkage (default: "ward") How to measure distance between clusters:
- ward: Minimizes variance (default, best for most cases)
- complete: Maximum distance between all point pairs
- average: Average distance between all point pairs
- single: Minimum distance (can create long chains)
Metric (default: "euclidean") Distance metric for computing linkage:
- euclidean: Standard distance (required for ward)
- manhattan: L1 distance
- cosine: Angle-based similarity
- Others: l1, l2, correlation, etc.
Distance Threshold (optional) Stop merging when distance exceeds this threshold. If set, n_clusters should be None.
Compute Full Tree (default: "auto") Whether to compute the full tree or stop early:
- auto: Automatically decide based on parameters
- true: Compute full dendrogram
- false: Stop early (faster)