Spectral Clustering
Uses eigenvalues of similarity matrix to perform dimensionality reduction before clustering in fewer dimensions
Uses eigenvalues of similarity matrix to perform dimensionality reduction before clustering in fewer dimensions.
When to use:
- Have non-convex clusters (circles, spirals)
- Graph-structured data
- Small to medium datasets
- Need to capture complex cluster shapes
Strengths: Handles non-convex shapes, works with graph data, can capture complex patterns, no assumptions about cluster shape Weaknesses: Slow on large datasets, requires specifying k, sensitive to parameters, memory intensive
Model Parameters
N Clusters (default: 8, required) Number of clusters to form.
Affinity (default: "rbf") How to construct the similarity matrix:
- rbf: Radial basis function kernel (default, good for continuous data)
- nearest_neighbors: K-nearest neighbors graph
- precomputed: Use your own affinity matrix
N Neighbors (default: 10) Number of neighbors for nearest_neighbors affinity.
- 5-10: Local structure
- 10-20: Good default
- 20+: More global structure
Assign Labels (default: "kmeans") Strategy for assigning labels in embedding space:
- kmeans: Use K-Means (default, usually better)
- discretize: Discretization method (faster)
Random State (default: 42) Seed for reproducibility.