Outlier Detection

Use me when you want to find the rebels - the data points that don't fit in. I'll highlight the unusual, the extreme, and the potentially problematic values. That 200-year-old customer? The $1 million typo? The measurement that's way off? I'll find them for you. Perfect for data quality checks, fraud detection, or just understanding what's weird in your dataset.

Overview

An outlier detection plot helps identify data points that deviate significantly from the rest of your dataset. These unusual values can represent errors, rare events, or important anomalies worth investigating. The visualization highlights outliers using statistical methods, making it easy to spot and analyze unusual patterns.

Best used for:

Data quality checks and validation
Identifying data entry errors
Detecting anomalies or unusual events
Understanding data distribution
Preparing data for analysis or modeling
Finding extreme values that need attention

Common Use Cases

Data Quality & Cleaning

Detecting data entry errors
Finding measurement errors
Validating data ranges
Identifying corrupted records
Pre-processing before analysis

Business Analytics

Fraud detection
Unusual transaction identification
Customer behavior anomalies
Sales spike or drop detection
Inventory discrepancies

Scientific Research

Experimental measurement errors
Unusual observations worth investigating
Quality control in manufacturing
Sensor malfunction detection
Research data validation

Options

Target Column

Required - Select the column to analyze for outliers.

Choose a numerical or categorical column to examine. The plot will identify values that significantly deviate from the typical pattern.

Accepts: NUMERICAL or CATEGORICAL columns

Settings

Highlight Outliers

Optional - Visually emphasize outlier points.

When enabled, outlier points are highlighted with distinct colors or markers, making them easier to identify and analyze.

Understanding Outliers

What is an Outlier?

An outlier is a data point that differs significantly from other observations. It may indicate:

Measurement or data entry error
Natural variation (rare but valid)
Fraud or anomaly
Important discovery
Equipment malfunction

Types of Outliers

Univariate Outliers

Extreme values in a single variable
Example: A 200-year-old person in age data
Detected using methods like IQR or Z-score

Multivariate Outliers

Unusual combinations of values
Example: Low income with luxury purchases
May appear normal individually

Contextual Outliers

Unusual in specific context
Example: Winter temperature in summer
Depends on time, location, or other factors

Detection Methods

IQR Method (Interquartile Range)

Most common approach
Outliers: values < Q1 - 1.5×IQR or > Q3 + 1.5×IQR
Robust to extreme values
Good for skewed distributions

Z-Score Method

Based on standard deviations from mean
Outliers: typically |Z| > 3
Assumes normal distribution
Sensitive to extreme outliers

Visual Inspection

Look for points far from main cluster
Examine distribution tails
Check for unexpected patterns
Consider domain knowledge

Interpreting the Plot

Visual Indicators

Highlighted points: Identified outliers
Position: How far from typical values
Clustering: Groups of outliers vs. isolated points
Patterns: Systematic vs. random outliers

What to Look For

Isolated extreme values: Potential errors
Clusters of outliers: Subpopulations or patterns
Direction: High vs. low outliers
Frequency: How common are outliers?

Handling Outliers

Investigation Steps

Verify the value: Is it a data error?
Check context: Does it make sense?
Look for patterns: Are there similar cases?
Consider cause: Why might this occur?

Common Actions

Remove: When outliers are clear errors

Data entry mistakes
Measurement failures
Corrupted records

Keep: When outliers are valid

Rare but real events
Important discoveries
Natural variation

Transform: When appropriate

Log transformation for right-skewed data
Winsorization (cap at threshold)
Separate analysis of outliers

Investigate: When uncertain

Gather more information
Check source data
Consult domain experts

Tips for Effective Outlier Analysis

Always Investigate Before Removing:
- Never automatically delete outliers
- Understand why they exist
- Document your decisions
- Consider impact on analysis
Use Multiple Methods:
- Different methods for different data types
- Compare IQR and Z-score results
- Visual inspection alongside statistics
- Consider domain-specific rules
Consider Context:
- What's normal for this data?
- Are extreme values possible?
- Check data collection process
- Review time periods and conditions
Document Outliers:
- Record outlier criteria used
- Note removed vs. kept outliers
- Explain reasoning for decisions
- Track impact on results
Be Careful with Automatic Removal:
- May remove important information
- Can bias your analysis
- Test sensitivity to outlier treatment
- Report analysis with and without outliers

Common Outlier Patterns

Single Extreme Point

One value far from all others - often a data error or rare event.

Multiple Outliers in Same Direction

Several high or low values - may indicate subgroup or systematic issue.

Outliers on Both Ends

Both very high and very low values - check data range validity.

Clustered Outliers

Groups of unusual values - may represent valid subpopulation.

After Outlier Detection

Box plot - See outliers in context of distribution
Histogram - Understand overall data distribution
Scatter plot - Check relationships with other variables
Time series - See if outliers occur at specific times
Summary statistics - Compare with/without outliers

Troubleshooting

Issue: Too many points marked as outliers

Solution: Method may be too sensitive. Try different detection method. Consider if data has natural high variance. Check if multiple subgroups exist.

Issue: No outliers detected but data looks suspicious

Solution: Try different detection method. Lower threshold for Z-score. Use visual inspection. Check for multivariate outliers.

Issue: Same points always flagged

Solution: These may be persistent data issues. Investigate root cause. Check if valid extreme values. Consider separate handling.

Issue: Outliers only in specific groups

Solution: Analyze groups separately. May indicate data quality issues in subset. Check if different scales or units used.

Issue: Can't decide if outlier is valid

Solution: Check source data and documentation. Consult domain experts. Run analysis with and without. Consider impact on conclusions.

Issue: Outliers change analysis results significantly

Solution: Report sensitivity analysis. Consider robust methods. Investigate outlier cause. May indicate influential points worth studying.

Outlier Detection

Overview

Common Use Cases

Data Quality & Cleaning

Business Analytics

Scientific Research

Options

Target Column

Settings

Highlight Outliers

Understanding Outliers

What is an Outlier?

Types of Outliers

Detection Methods

IQR Method (Interquartile Range)

Z-Score Method

Visual Inspection

Interpreting the Plot

Visual Indicators

What to Look For

Handling Outliers

Investigation Steps

Common Actions

Tips for Effective Outlier Analysis

Common Outlier Patterns

Single Extreme Point

Multiple Outliers in Same Direction

Outliers on Both Ends

Clustered Outliers

After Outlier Detection

Troubleshooting

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Outlier Detection

Overview

Common Use Cases

Data Quality & Cleaning

Business Analytics

Scientific Research

Options

Target Column

Settings

Highlight Outliers

Understanding Outliers

What is an Outlier?

Types of Outliers

Detection Methods

IQR Method (Interquartile Range)

Z-Score Method

Visual Inspection

Interpreting the Plot

Visual Indicators

What to Look For

Handling Outliers

Investigation Steps

Common Actions

Tips for Effective Outlier Analysis

Common Outlier Patterns

Single Extreme Point

Multiple Outliers in Same Direction

Outliers on Both Ends

Clustered Outliers

Related Analyses

After Outlier Detection

Troubleshooting

On this page

Command Palette