Mini Batch K-Means
Fast variant of K-Means that uses mini-batches of data to reduce computation time
Fast variant of K-Means that uses mini-batches of data to reduce computation time.
When to use:
- Very large datasets (>10k samples)
- Need faster training than K-Means
- Can accept slightly lower quality for speed
- Memory constraints
Strengths: Much faster than K-Means, lower memory usage, good for large datasets, similar quality to K-Means Weaknesses: Slightly less accurate than K-Means, more sensitive to initialization
Model Parameters
N Clusters (default: 8, required) Number of clusters to form.
Init Method (default: "k-means++") How to initialize cluster centers:
- k-means++: Smart initialization (better)
- random: Random initialization (faster)
Max Iterations (default: 100) Maximum iterations over the complete dataset.
- 50-100: Usually sufficient for large data
- 200+: For better convergence
Batch Size (default: 1024) Size of mini batches for training.
- 256-512: Small batches (more updates, slower)
- 1024: Good default
- 2048+: Large batches (fewer updates, faster)
Random State (default: 42) Seed for reproducibility.