Hierarchical Clustering
Tree-based agglomerative clustering with linkage-defined distance
Hierarchical Clustering builds a tree of clusters (dendrogram) by iteratively merging the closest pairs of clusters. The resulting structure can be cut at any level to produce the desired number of clusters.
When to use:
- Exploring cluster hierarchies at different granularities
- When the number of clusters is uncertain and visual exploration of the dendrogram helps
- Small-to-medium datasets where O(n²) memory is acceptable
Input: Tabular data with the feature columns defined during training Output: Cluster label for each row
Model Settings (set during training, used at inference)
N Clusters (default: 2) Number of clusters to produce by cutting the dendrogram.
Linkage (default: ward)
Merge criterion. ward minimizes within-cluster variance (good default); complete maximizes inter-cluster distance; average uses average pairwise distance; single uses minimum distance (can create chain clusters).
Metric (default: euclidean)
Distance metric. Only euclidean is valid with ward linkage.
Inference Settings
No dedicated inference-time settings. New points are assigned to the cluster of their nearest training neighbor.