Scatter Plot
Visualize relationships between two numerical variables
Use me when you want to see if two things dance together. I'll show you if values move in sync (positive correlation), do opposite things (negative correlation), or just ignore each other. Perfect for finding patterns in the chaos - like whether study time really does improve test scores, or if temperature affects ice cream sales.
Overview
A scatter plot displays values for typically two variables for a set of data, showing the relationship or correlation between them. Each point on the graph represents an observation with coordinates determined by the values of the two variables.
Best used for:
- Identifying correlations and patterns between variables
- Detecting outliers and clusters in data
- Analyzing relationships in scientific and statistical data
- Comparing multiple dimensions when colored or sized by additional variables
Common Use Cases
Business Analytics
- Sales vs. Marketing Spend correlation
- Customer age vs. purchase frequency
- Product price vs. sales volume
Scientific Research
- Temperature vs. pressure measurements
- Drug dosage vs. patient response
- Environmental measurements correlation
Quality Control
- Manufacturing tolerances
- Defect rate analysis
- Process parameter relationships
Settings
X-Axis
Required - Select the numerical column for the horizontal axis.
This determines the independent variable in your analysis. Choose a column that you want to use as the predictor or explanatory variable.
Y-Axis
Required - Select the numerical column for the vertical axis.
This determines the dependent variable in your analysis. Choose a column that you want to analyze in relation to the X-axis.
Color By (Optional)
Group and color points by a categorical or numerical column.
Use cases:
- Categorical: Show different product categories, regions, or customer segments
- Numerical: Display a color gradient representing a third dimension (creates a bubble chart effect)
Size By (Optional)
Vary point sizes based on a numerical column value.
When combined with color, this creates a bubble chart that can display up to 4 dimensions of data simultaneously.
Hover Data (Optional)
Additional columns to display when hovering over points.
Include relevant context like IDs, names, dates, or other descriptive information to make tooltips more informative.
Advanced Settings
Marker Settings
- Size: Adjust the base size of scatter points (default: 8px)
- Opacity: Control point transparency (0-1, default: 0.8)
- Symbol: Change marker shape (circle, square, diamond, etc.)
Axis Settings
- Logarithmic Scale: Enable log scale for skewed data
- Axis Range: Set custom min/max values
- Grid Lines: Toggle visibility and styling
Trendline
Add a line of best fit to visualize the correlation:
- Linear: Straight line showing average relationship
- Polynomial: Curved line for non-linear relationships
- Lowess: Locally weighted smoothing for complex patterns
Tips for Effective Scatter Plots
-
Check for Correlation: Look for patterns - positive slope indicates positive correlation, negative slope indicates negative correlation
-
Identify Outliers: Points far from the main cluster may indicate anomalies or special cases worth investigating
-
Scale Appropriately: Use logarithmic scales when data spans multiple orders of magnitude
-
Limit Points: For datasets with thousands of points, consider:
- Sampling the data
- Using opacity to show density
- Aggregating into hexbins or 2D histograms
-
Add Context: Use color and size to add extra dimensions without cluttering the visualization
Example Scenarios
Positive Correlation
When both variables increase together (e.g., study hours vs. test scores).
Negative Correlation
When one variable increases as the other decreases (e.g., price vs. demand).
No Correlation
When variables are independent (points scattered randomly).
Clustered Data
When distinct groups exist in the data.
Troubleshooting
Issue: Too many overlapping points make patterns hard to see
- Solution: Reduce point opacity (try 30-50%), add jitter, use smaller point sizes, or consider hexbin/density plots for very large datasets.
Issue: Can't see correlation in the data
- Solution: Add a trend line or regression line, check if a log scale is needed, remove extreme outliers that compress the main pattern.
Issue: Points are too small or too large
- Solution: Adjust marker size in settings, ensure point size is visible but not overwhelming (typically 4-8px for most uses).
Issue: Categories are hard to distinguish by color
- Solution: Use a colorblind-friendly palette, ensure sufficient color contrast, limit to 5-7 categories, or use different shapes/symbols.
Issue: Outliers dominate the view
- Solution: Filter extreme outliers, use axis limits to zoom into main cluster, or use log scale if outliers span orders of magnitude.
Issue: Can't identify specific data points
- Solution: Enable hover tooltips with detailed information, add labels to important points, or use interactive zoom/pan features.
Issue: Need to show third or fourth variable
- Solution: Use color for third variable, size for fourth variable. Avoid using both unless absolutely necessary as it can become cluttered.
Issue: Trend line doesn't fit the data well
- Solution: Try different regression methods (linear, polynomial, LOWESS), check for non-linear relationships, or fit separate lines for different groups.