Dokumentation (english)

Outlier Detection

Identify and visualize outliers in your data

Use me when you want to find the rebels - the data points that don't fit in. I'll highlight the unusual, the extreme, and the potentially problematic values. That 200-year-old customer? The $1 million typo? The measurement that's way off? I'll find them for you. Perfect for data quality checks, fraud detection, or just understanding what's weird in your dataset.

Overview

An outlier detection plot helps identify data points that deviate significantly from the rest of your dataset. These unusual values can represent errors, rare events, or important anomalies worth investigating. The visualization highlights outliers using statistical methods, making it easy to spot and analyze unusual patterns.

Best used for:

  • Data quality checks and validation
  • Identifying data entry errors
  • Detecting anomalies or unusual events
  • Understanding data distribution
  • Preparing data for analysis or modeling
  • Finding extreme values that need attention

Common Use Cases

Data Quality & Cleaning

  • Detecting data entry errors
  • Finding measurement errors
  • Validating data ranges
  • Identifying corrupted records
  • Pre-processing before analysis

Business Analytics

  • Fraud detection
  • Unusual transaction identification
  • Customer behavior anomalies
  • Sales spike or drop detection
  • Inventory discrepancies

Scientific Research

  • Experimental measurement errors
  • Unusual observations worth investigating
  • Quality control in manufacturing
  • Sensor malfunction detection
  • Research data validation

Options

Target Column

Required - Select the column to analyze for outliers.

Choose a numerical or categorical column to examine. The plot will identify values that significantly deviate from the typical pattern.

Accepts: NUMERICAL or CATEGORICAL columns

Settings

Highlight Outliers

Optional - Visually emphasize outlier points.

When enabled, outlier points are highlighted with distinct colors or markers, making them easier to identify and analyze.

Understanding Outliers

What is an Outlier?

An outlier is a data point that differs significantly from other observations. It may indicate:

  • Measurement or data entry error
  • Natural variation (rare but valid)
  • Fraud or anomaly
  • Important discovery
  • Equipment malfunction

Types of Outliers

Univariate Outliers

  • Extreme values in a single variable
  • Example: A 200-year-old person in age data
  • Detected using methods like IQR or Z-score

Multivariate Outliers

  • Unusual combinations of values
  • Example: Low income with luxury purchases
  • May appear normal individually

Contextual Outliers

  • Unusual in specific context
  • Example: Winter temperature in summer
  • Depends on time, location, or other factors

Detection Methods

IQR Method (Interquartile Range)

  • Most common approach
  • Outliers: values < Q1 - 1.5×IQR or > Q3 + 1.5×IQR
  • Robust to extreme values
  • Good for skewed distributions

Z-Score Method

  • Based on standard deviations from mean
  • Outliers: typically |Z| > 3
  • Assumes normal distribution
  • Sensitive to extreme outliers

Visual Inspection

  • Look for points far from main cluster
  • Examine distribution tails
  • Check for unexpected patterns
  • Consider domain knowledge

Interpreting the Plot

Visual Indicators

  • Highlighted points: Identified outliers
  • Position: How far from typical values
  • Clustering: Groups of outliers vs. isolated points
  • Patterns: Systematic vs. random outliers

What to Look For

  1. Isolated extreme values: Potential errors
  2. Clusters of outliers: Subpopulations or patterns
  3. Direction: High vs. low outliers
  4. Frequency: How common are outliers?

Handling Outliers

Investigation Steps

  1. Verify the value: Is it a data error?
  2. Check context: Does it make sense?
  3. Look for patterns: Are there similar cases?
  4. Consider cause: Why might this occur?

Common Actions

Remove: When outliers are clear errors

  • Data entry mistakes
  • Measurement failures
  • Corrupted records

Keep: When outliers are valid

  • Rare but real events
  • Important discoveries
  • Natural variation

Transform: When appropriate

  • Log transformation for right-skewed data
  • Winsorization (cap at threshold)
  • Separate analysis of outliers

Investigate: When uncertain

  • Gather more information
  • Check source data
  • Consult domain experts

Tips for Effective Outlier Analysis

  1. Always Investigate Before Removing:

    • Never automatically delete outliers
    • Understand why they exist
    • Document your decisions
    • Consider impact on analysis
  2. Use Multiple Methods:

    • Different methods for different data types
    • Compare IQR and Z-score results
    • Visual inspection alongside statistics
    • Consider domain-specific rules
  3. Consider Context:

    • What's normal for this data?
    • Are extreme values possible?
    • Check data collection process
    • Review time periods and conditions
  4. Document Outliers:

    • Record outlier criteria used
    • Note removed vs. kept outliers
    • Explain reasoning for decisions
    • Track impact on results
  5. Be Careful with Automatic Removal:

    • May remove important information
    • Can bias your analysis
    • Test sensitivity to outlier treatment
    • Report analysis with and without outliers

Common Outlier Patterns

Single Extreme Point

One value far from all others - often a data error or rare event.

Multiple Outliers in Same Direction

Several high or low values - may indicate subgroup or systematic issue.

Outliers on Both Ends

Both very high and very low values - check data range validity.

Clustered Outliers

Groups of unusual values - may represent valid subpopulation.

After Outlier Detection

  1. Box plot - See outliers in context of distribution
  2. Histogram - Understand overall data distribution
  3. Scatter plot - Check relationships with other variables
  4. Time series - See if outliers occur at specific times
  5. Summary statistics - Compare with/without outliers

Troubleshooting

Issue: Too many points marked as outliers

  • Solution: Method may be too sensitive. Try different detection method. Consider if data has natural high variance. Check if multiple subgroups exist.

Issue: No outliers detected but data looks suspicious

  • Solution: Try different detection method. Lower threshold for Z-score. Use visual inspection. Check for multivariate outliers.

Issue: Same points always flagged

  • Solution: These may be persistent data issues. Investigate root cause. Check if valid extreme values. Consider separate handling.

Issue: Outliers only in specific groups

  • Solution: Analyze groups separately. May indicate data quality issues in subset. Check if different scales or units used.

Issue: Can't decide if outlier is valid

  • Solution: Check source data and documentation. Consult domain experts. Run analysis with and without. Consider impact on conclusions.

Issue: Outliers change analysis results significantly

  • Solution: Report sensitivity analysis. Consider robust methods. Investigate outlier cause. May indicate influential points worth studying.

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items