Dokumentation (english)

K-Means

Fast and scalable algorithm that partitions data into k clusters by minimizing within-cluster variance

Fast and scalable algorithm that partitions data into k clusters by minimizing within-cluster variance.

When to use:

  • Know approximately how many clusters to expect
  • Clusters are roughly spherical and similar size
  • Need fast results on large datasets
  • Good starting point for exploration

Strengths: Very fast, scalable to large datasets, simple and interpretable, consistent results Weaknesses: Must specify k in advance, assumes spherical clusters, sensitive to outliers, poor with varying cluster sizes

Model Parameters

N Clusters (default: 8, required) Number of clusters to form. This is the most important parameter.

  • Too low: Merges distinct groups
  • Too high: Splits natural groups
  • Use elbow method or silhouette analysis to find optimal k

Init Method (default: "k-means++") How to initialize cluster centers:

  • k-means++: Smart initialization (default, better convergence)
  • random: Random initialization (faster but may give poor results)

Max Iterations (default: 300) Maximum number of iterations for convergence.

  • 100-300: Usually sufficient
  • 500+: For difficult datasets or large k

Random State (default: 42) Seed for reproducibility. Keep consistent for comparable results.

On this page


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items