Dokumentation (english)

SHAP Dependence Plot

Understand how features affect model predictions

Use me when you need to peek inside your ML model's brain and see exactly how it's making decisions. I'm like an X-ray for machine learning - showing you how each feature value pushes predictions up or down, where the magic thresholds are, and which features are secretly working together. Essential for explaining "why did the model predict that?" to anyone who asks.

Overview

A SHAP (SHapley Additive exPlanations) dependence plot is a scatter plot that shows the relationship between a feature's value and its impact on model predictions. Each point represents a single prediction, with the x-axis showing the feature value and the y-axis showing the SHAP value (how much that feature value affected the prediction). Color can represent another feature to reveal interaction effects.

Best used for:

  • Understanding how individual features affect predictions
  • Identifying non-linear relationships in model behavior
  • Discovering feature interactions and dependencies
  • Explaining model decisions to stakeholders
  • Debugging unexpected model behavior
  • Finding threshold effects and decision boundaries

Common Use Cases

Model Interpretation & Explainability

  • Explaining credit scoring decisions (how income affects approval)
  • Understanding medical diagnosis predictions
  • Interpreting risk assessment models
  • Regulatory compliance and model transparency
  • Customer-facing explanations of automated decisions

Model Development & Debugging

  • Validating expected feature relationships
  • Identifying unexpected non-linear patterns
  • Discovering problematic feature interactions
  • Detecting data quality issues through unusual patterns
  • Comparing model behavior across different segments

Feature Engineering & Selection

  • Understanding which features drive predictions
  • Identifying redundant or low-impact features
  • Discovering threshold values for binning
  • Finding optimal feature transformations
  • Prioritizing features for further analysis

Options

Feature Column

Required - Select the feature to analyze.

Choose the model feature whose impact you want to understand. This feature's values will be plotted on the x-axis, and its SHAP values (impact on predictions) on the y-axis.

SHAP Values Column

Required - Select the column containing SHAP values.

This should be the SHAP value column corresponding to the selected feature. SHAP values indicate how much each feature value contributes to pushing the prediction higher or lower.

Color By (Interaction Feature)

Optional - Color points by another feature to reveal interactions.

When specified, points are colored by this feature's values, making it easy to see how two features interact. For example, color by "Age" while analyzing "Income" to see if income's effect varies by age.

Point Size

Optional - Vary point size by another variable.

Useful for showing a third dimension, such as sample weight or prediction confidence.

Settings

Show Trend Line

Optional - Add a smoothed trend line showing average relationship.

Shows the general pattern of how the feature affects predictions, making non-linear relationships easier to see.

Trend Method

Optional - Choose smoothing method for trend line.

Options:

  • LOWESS - Locally weighted smoothing (default, flexible)
  • Rolling Mean - Moving average (simpler, less flexible)
  • Linear - Straight line fit
  • Polynomial - Curved polynomial fit

Jitter

Optional - Add random offset to reduce overplotting.

Options:

  • None (0.0), Low (0.05), Medium (0.1), High (0.2)

Useful when many points have identical or similar values.

Point Opacity

Optional - Transparency of data points.

Options:

  • 100%, 80%, 60%, 40%, 20%

Lower opacity helps see density in overlapping regions.

Point Size

Optional - Size of scatter plot points.

Options:

  • Small (3px), Medium (5px), Large (8px), Extra Large (12px)

Color Scale

Optional - Color scheme for interaction feature.

Show Zero Reference Line

Optional - Display horizontal line at SHAP value = 0.

Helpful for seeing which feature values increase (positive SHAP) vs decrease (negative SHAP) predictions.

Understanding SHAP Dependence Plots

Reading the Plot

Y-axis (SHAP Value):

  • Positive values: Feature value increases the prediction
  • Negative values: Feature value decreases the prediction
  • Magnitude: Larger absolute values = stronger effect

X-axis (Feature Value):

  • The actual values of the feature being analyzed
  • Spread shows the range of values in your data

Color (Interaction Feature):

  • Reveals how another feature modifies the relationship
  • Vertical color gradients indicate interaction effects
  • Horizontal color bands suggest independence

Common Patterns

Linear Relationship:

  • Points form a straight diagonal line
  • Feature has consistent, proportional effect
  • Example: Higher credit score → proportionally higher approval probability

Non-linear Relationship:

  • Curved or bent pattern
  • Feature effect changes across its range
  • Example: Age effect saturates after 40 years

Threshold Effect:

  • Vertical jump or step in the pattern
  • Sharp change at specific value
  • Example: Income above $50k dramatically increases approval

Interaction Effect:

  • Vertical color separation (same x-value, different SHAP values for different colors)
  • Feature effect depends on another feature
  • Example: Income matters more for young applicants (shown by color)

No Effect:

  • Horizontal line at SHAP = 0
  • Feature doesn't influence predictions
  • Consider removing from model

SHAP Value Interpretation

Absolute SHAP Value: |SHAP| = 0.5 means the feature changes the prediction by 0.5 units in the model's output space.

For classification models:

  • Output space is log-odds
  • SHAP = 1.0 means roughly 2.7x change in odds

For regression models:

  • Output space is the target variable units
  • SHAP = 10 means +10 units to the prediction

Tips for Effective SHAP Dependence Plots

  1. Start with Important Features:

    • Use global feature importance to identify top features
    • Focus analysis on features that matter most
    • Don't waste time on negligible features
  2. Choose Interaction Features Wisely:

    • Use SHAP interaction values to find strong interactions
    • Color by features suspected to interact
    • Try different combinations to uncover patterns
  3. Look for Validation Opportunities:

    • Compare patterns to domain knowledge
    • Unexpected patterns may indicate bugs or insights
    • Validate threshold effects with business rules
  4. Combine with Other Plots:

  5. Handle Overplotting:

    • Reduce point opacity for dense regions
    • Add jitter for discrete features
    • Use trend lines to show general pattern
    • Consider hexbin aggregation for very large datasets
  6. Interpret with Context:

    • Remember SHAP shows marginal contributions
    • Feature effects are relative to average prediction
    • Consider feature distributions when generalizing

SHAP Dependence vs Other Plots

vs Partial Dependence Plot (PDP)

  • SHAP Dependence: Shows individual predictions, reveals heterogeneity, includes interactions
  • PDP: Shows average effect, smoother, ignores interactions
  • Choose SHAP: When interactions matter or you need instance-level detail

vs Feature Importance

  • SHAP Dependence: Shows how feature affects predictions (direction and magnitude)
  • Feature Importance: Shows which features matter most (ranking only)
  • Use together: Importance to identify key features, dependence to understand them

vs Scatter Plot

  • SHAP Dependence: Y-axis is model impact, explains predictions
  • Regular Scatter: Y-axis is target variable, shows correlations
  • SHAP advantage: Directly shows model behavior, not just data correlation

Example Scenarios

Credit Scoring Model

Feature: Credit Score, Color by: Age Shows how credit score affects approval, and whether young vs old applicants are treated differently.

House Price Prediction

Feature: Square Footage, Color by: Location Reveals whether size matters more in expensive neighborhoods.

Medical Diagnosis

Feature: Blood Pressure, Color by: Medication Use Shows threshold effects and how medication modifies risk assessment.

Customer Churn Prediction

Feature: Account Age, Color by: Contract Type Identifies when customers become loyal and how contract type affects retention.

Interpreting Interactions

Strong Interaction Indicators:

  • Vertical color separation: Same feature value → different SHAP values based on interaction feature
  • Color-dependent slopes: Relationship angle changes with color
  • Threshold shifts: Decision boundaries move based on interaction feature

No Interaction Indicators:

  • Horizontal color bands: Colors layer on top without vertical spread
  • Parallel patterns: All colors follow same curve
  • Uniform mixing: Colors evenly distributed at each x-value

Example:

If "Income" dependence plot colored by "Age" shows:

  • Young (blue) points clustered at low SHAP values
  • Old (red) points clustered at high SHAP values
  • At the same income level

This indicates Age modifies Income's effect - strong interaction!

Troubleshooting

Issue: Points are too dense to see patterns

  • Solution: Reduce point opacity to 20-40%, add jitter for discrete features, enable trend line, or sample data if very large.

Issue: Trend line doesn't fit the pattern well

  • Solution: Change trend method (LOWESS for complex curves, Linear for simple relationships, Polynomial for smooth curves).

Issue: Can't see if interactions exist

  • Solution: Color by suspected interaction features, use SHAP interaction values to identify candidates, try multiple color-by options.

Issue: SHAP values seem too large or small

  • Solution: Check model output scale (log-odds for classification, target units for regression), verify SHAP values are correctly calculated, consider the base value.

Issue: Pattern contradicts domain knowledge

  • Solution: Could indicate data quality issues, model bugs, or genuine insights. Investigate with data experts, check for data leakage, validate with other instances.

Issue: Categorical feature is hard to read

  • Solution: Encode categorical values as integers, add jitter to separate points, or create separate dependence plots for each category level.

Issue: Feature has no apparent effect (flat line)

  • Solution: Verify feature is actually used in model, check if effect is mediated through other features, consider removing if truly irrelevant.

Issue: Outliers compress the main pattern

  • Solution: Filter extreme SHAP values, use y-axis limits to zoom, or investigate outliers separately as they may be interesting edge cases.

Instead of dependence plot, consider:

Use together with dependence plot:


Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items