Dokumentation (english)

Correlation

Visualize correlations between multiple numerical variables

Use me when you want to see all your variables' relationships at once, in one beautiful grid. I'll show you which variables are best friends (high positive correlation), which are enemies (negative correlation), and which ignore each other (near zero). Essential for data exploration - I'll reveal the hidden connections in your dataset like a relationship therapist for numbers.

Overview

A correlation plot (correlation matrix) displays the correlation coefficients between multiple numerical variables in a heatmap format. Each cell shows how strongly two variables are related, with values ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Color intensity makes patterns immediately visible, helping identify relationships, redundancies, and potential predictors.

Best used for:

  • Exploring relationships between multiple variables
  • Feature selection for machine learning
  • Identifying multicollinearity in regression
  • Understanding dataset structure
  • Finding redundant or highly correlated features
  • Detecting unexpected relationships in data

Common Use Cases

Data Science & Machine Learning

  • Feature selection and engineering
  • Multicollinearity detection
  • Variable redundancy analysis
  • Understanding feature relationships
  • Identifying potential predictors

Statistical Analysis

  • Exploratory data analysis (EDA)
  • Understanding variable dependencies
  • Hypothesis generation
  • Data validation and quality checks
  • Relationship strength assessment

Business Analytics

  • Customer behavior patterns
  • Product affinity analysis
  • KPI relationship analysis
  • Marketing channel effectiveness
  • Sales driver identification

Options

Columns of Interest

Required - Select numerical columns to analyze.

Choose 2 or more numerical columns to calculate correlations between all pairs. The plot will show an N×N matrix where N is the number of selected columns.

Settings

Annotate Segments With Value

Optional - Display correlation values in each cell.

When enabled, shows the numerical correlation coefficient in each cell, making exact values easy to read.

Understanding Correlation Values

Correlation Coefficient Range

  • +1.0: Perfect positive correlation
  • +0.7 to +0.9: Strong positive correlation
  • +0.4 to +0.6: Moderate positive correlation
  • +0.1 to +0.3: Weak positive correlation
  • 0.0: No correlation
  • -0.1 to -0.3: Weak negative correlation
  • -0.4 to -0.6: Moderate negative correlation
  • -0.7 to -0.9: Strong negative correlation
  • -1.0: Perfect negative correlation

Positive Correlation

Variables move together in the same direction:

  • When one increases, the other tends to increase
  • Example: Height and weight
  • Example: Study time and test scores

Negative Correlation

Variables move in opposite directions:

  • When one increases, the other tends to decrease
  • Example: Speed and travel time
  • Example: Price and demand

No Correlation

Variables have no linear relationship:

  • Changes in one don't predict changes in the other
  • Value near 0
  • May still have non-linear relationships

Interpreting the Correlation Matrix

Diagonal

  • Always shows 1.0 (perfect self-correlation)
  • Each variable is perfectly correlated with itself
  • Usually displayed in distinct color

Symmetry

  • Matrix is symmetric across diagonal
  • Correlation(A, B) = Correlation(B, A)
  • Only need to examine one triangle

Color Intensity

  • Darker/brighter colors indicate stronger correlations
  • Near-zero correlations appear as neutral color
  • Look for patterns and clusters

Tips for Effective Correlation Analysis

  1. Variable Selection:

    • Include relevant numerical variables
    • Remove variables with no variance
    • Consider standardizing beforeonly if scales differ greatly
    • Limit to 15-20 variables for readability
  2. Interpretation Guidelines:

    • Correlation ≠ causation
    • Only measures linear relationships
    • Outliers can distort correlations
    • Consider sample size and significance
  3. Multicollinearity Detection:

    • Look for high correlations (> 0.8 or < -0.8)
    • In regression, remove one of highly correlated predictors
    • Or use dimensionality reduction (PCA)
    • Keep variables with different information
  4. Feature Selection:

    • Identify features highly correlated with target
    • Remove redundant features (highly correlated with each other)
    • Balance between information and collinearity
    • Consider domain knowledge
  5. Data Quality Checks:

    • Unexpected correlations may indicate data issues
    • Very low correlations everywhere may suggest problems
    • Check for data entry errors or wrong units
    • Verify variable relationships make sense
  6. Visualization Tips:

    • Enable value annotations for precise reading
    • Use color scale appropriate for audience
    • Consider showing only lower/upper triangle
    • Reorder variables to group related ones

Common Correlation Patterns

Strong Positive Clusters

Groups of variables all positively correlated - may indicate redundancy.

Mixed Relationships

Complex pattern of positive and negative correlations.

Block Diagonal Pattern

Distinct groups of correlated variables with weak between-group correlation.

Target Correlation

One row/column showing which features correlate with target variable.

Statistical Considerations

Sample Size

  • Small samples (<30): Correlations unreliable
  • Medium samples (30-100): Use with caution
  • Large samples (>100): More reliable estimates
  • Very large samples: Even tiny correlations may be "significant"

Assumptions

  • Linear relationship between variables
  • Continuous or ordinal numerical data
  • No extreme outliers
  • Bivariate normal distribution (for significance tests)

Limitations

  • Only detects linear relationships
  • Outliers can heavily influence results
  • Does not imply causation
  • May miss non-linear relationships

After Correlation Analysis

  1. Scatter plots - Visualize specific pairwise relationships
  2. Partial correlation - Remove effect of confounding variables
  3. Regression analysis - Model relationships
  4. Principal Component Analysis - Reduce dimensions
  5. Cluster analysis - Group correlated variables

Example Scenarios

Machine Learning Feature Selection

Identify features correlated with target and remove redundant features.

Financial Data Analysis

Find relationships between economic indicators.

Healthcare Data

Understand relationships between patient measurements and outcomes.

Marketing Analytics

Identify which metrics move together for campaign optimization.

Troubleshooting

Issue: All correlations are very weak (near 0)

  • Solution: Variables may truly be independent, or relationships may be non-linear. Check scatter plots. Verify data quality and that variables are numerical.

Issue: Correlation matrix is not symmetric

  • Solution: This should not happen. Check data and report as bug. Matrix must be symmetric by definition.

Issue: Perfect correlations (1.0) off the diagonal

  • Solution: Two variables are perfectly linearly related - one is redundant or they're measuring the same thing. Remove one.

Issue: Cannot see color differences

  • Solution: Enable "Annotate Segments With Value" to see exact numbers. Consider different color scale.

Issue: Too many variables to read

  • Solution: Reduce number of columns (limit to 10-15), or create multiple correlation matrices for subsets of variables.

Issue: Unexpected correlations appear

  • Solution: May indicate data issues, confounding variables, or interesting relationships. Investigate with scatter plots and domain expertise.

Issue: Negative correlation where positive expected (or vice versa)

  • Solution: Verify variable coding (e.g., satisfaction coded as 1=bad, 5=good). Check for data entry errors.

Issue: Very strong correlations everywhere

  • Solution: Variables may be measuring similar constructs. Check if variables need to be transformed or if multicollinearity is present.

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items