Dokumentation (english)

DBSCAN

Density-Based Spatial Clustering finds arbitrarily-shaped clusters and identifies outliers as noise points

Density-Based Spatial Clustering finds arbitrarily-shaped clusters and identifies outliers as noise points.

When to use:

  • Don't know number of clusters in advance
  • Clusters have arbitrary shapes (not just spherical)
  • Need to identify outliers/anomalies
  • Have varying cluster densities

Strengths: Finds arbitrary shapes, detects outliers, no need to specify k, robust to noise Weaknesses: Sensitive to parameters (eps, min_samples), struggles with varying densities, not scalable to very large datasets

Model Parameters

Eps (default: 0.5, required) Maximum distance between two samples to be considered neighbors. This is crucial.

  • Too low: Many small clusters and noise points
  • Too high: Merges distinct clusters
  • Use k-distance plot to determine optimal eps

Min Samples (default: 5, required) Minimum points needed to form a dense region (core point).

  • 3-5: Sensitive, more clusters
  • 5-10: Good default
  • 10+: Conservative, fewer clusters, more noise

Metric (default: "euclidean") Distance metric:

  • euclidean: Standard distance (default)
  • manhattan: City-block distance
  • chebyshev: Maximum coordinate difference
  • Others: cosine, minkowski, etc.

Algorithm (default: "auto") Algorithm for nearest neighbors:

  • auto: Automatically choose best (default)
  • ball_tree: Good for low dimensions
  • kd_tree: Fast for low dimensions
  • brute: Exact but slow (use for small datasets)

P (optional) Power parameter for Minkowski metric (2 = Euclidean, 1 = Manhattan).

On this page


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items