Truncated SVD
SVD-based dimensionality reduction that works well on sparse data
Truncated SVD (also known as LSA — Latent Semantic Analysis — in NLP) decomposes the data matrix and keeps only the top singular components. Unlike PCA, it does not center the data, making it suitable for sparse matrices like TF-IDF features.
When to use:
- Sparse feature matrices (TF-IDF, bag-of-words, one-hot encoded categoricals)
- Large-scale text or document feature reduction
- When centering the data is not desirable or feasible
Input: Tabular (or sparse) data with the feature columns defined during training Output: Projected coordinates in the reduced-dimensional space
Model Settings (set during training, used at inference)
N Components (default: 2) Number of singular components to keep.
Algorithm (default: randomized)
SVD solver. randomized is fast; arpack is exact but slower.
N Iterations (default: 5)
Iterations for the randomized solver. More iterations improve accuracy.
Inference Settings
No dedicated inference-time settings. The trained singular vectors are applied to project new data.