Gaussian Mixture Model
Probabilistic model that assumes data comes from a mixture of Gaussian distributions with unknown parameters
Probabilistic model that assumes data comes from a mixture of Gaussian distributions with unknown parameters.
When to use:
- Want probabilistic cluster assignments
- Clusters have elliptical shapes
- Need uncertainty estimates
- Have normally distributed data
Strengths: Soft clustering (probabilities), flexible cluster shapes (elliptical), model selection with BIC/AIC, handles overlapping clusters Weaknesses: Assumes Gaussian distribution, sensitive to initialization, can overfit with too many components
Model Parameters
N Components (default: 1, required) Number of Gaussian components (clusters). Similar to k in K-Means.
- Use BIC/AIC scores to select optimal number
- Too few: Underfits complex data
- Too many: Overfits, finds spurious clusters
Covariance Type (default: "full") Shape of covariance matrices:
- full: Each component has its own covariance matrix (most flexible)
- tied: All components share same covariance (assumes similar shapes)
- diag: Diagonal covariance (axis-aligned ellipses, faster)
- spherical: Single variance per component (similar to K-Means)
Tolerance (default: 0.001) Convergence threshold. Lower values = more iterations.
- 0.001-0.01: Standard
- <0.001: Stricter convergence
Max Iterations (default: 100) Maximum EM iterations to perform.
- 100: Usually sufficient
- 200+: For difficult convergence
Random State (default: 42) Seed for reproducibility.