PCA
Principal Component Analysis finds orthogonal directions of maximum variance in the data
PCA
Principal Component Analysis finds orthogonal directions of maximum variance in the data.
When to use:
- Need interpretable linear combinations
- Want to remove correlated features
- Data has linear structure
- Need fast, scalable solution
- First choice for most problems
Strengths: Fast, scalable, interpretable, reversible, works on new data, no hyperparameters Weaknesses: Linear only, sensitive to scaling, assumes Gaussian-like distributions
Model Parameters
N Components (default: 2, required) Number of principal components to keep.
- 2-3: Visualization
- Based on explained variance: Keep components explaining 80-95% variance
- Rule of thumb: min(n_samples, n_features)
SVD Solver (default: "auto") Algorithm to compute singular value decomposition:
- auto: Automatically choose based on data shape (default)
- full: Exact, slow, uses standard LAPACK solver
- arpack: Faster for small n_components, iterative
- randomized: Very fast approximation for large datasets
Whiten (default: false) Transform components to have unit variance.
- false: Components scaled by explained variance (default)
- true: All components have equal variance (useful before clustering/classification)
Random State (default: 42) Seed for reproducibility (used with randomized solver).