Histogram
Visualize the distribution of numerical data
Use me when you want to see the shape of your data - where the mountains and valleys are. I'll group your numbers into bins and show you the distribution. Are most values clustered in the middle? Skewed to one side? Multiple peaks? I'll reveal if your data is bell-shaped, wonky, or hiding surprises.
Overview
A histogram is a graphical representation of the distribution of numerical data. It groups values into bins (intervals) and displays the frequency or count of observations falling into each bin using bars.
Best used for:
- Understanding data distribution patterns (normal, skewed, bimodal)
- Identifying the central tendency and spread of data
- Detecting outliers and unusual patterns
- Comparing distributions across different groups
- Quality control and process monitoring
Common Use Cases
Statistics & Data Analysis
- Age distribution in a population
- Test score distributions
- Income or salary ranges
- Measurement error analysis
Quality Control & Manufacturing
- Measurement variation analysis
- Process capability studies
- Defect distribution patterns
- Tolerance compliance checking
Data Science & Machine Learning
- Feature distribution analysis before modeling
- Identifying need for data transformations
- Detecting skewness and outliers
- Understanding target variable distribution
Options
Target Columns
Required - Select one or more numerical columns to visualize.
You can add multiple columns to compare their distributions side-by-side on the same plot. Each column will be shown in a different color.
Note: You can add multiple columns using the "+" button to compare distributions.
Settings
Show Frequency
Optional - Display count or frequency on Y-axis.
- On: Shows actual count of observations in each bin
- Off: Shows probability density (normalized)
Show Legend
Optional - Display legend when multiple columns are shown.
Useful when comparing distributions of multiple variables.
Show Axis Labels
Optional - Display axis labels.
Annotate Bars
Optional - Show values on top of each bar.
Displays the count or frequency for each bin directly on the histogram.
Show KDE
Optional - Overlay a Kernel Density Estimate curve.
A KDE provides a smooth, continuous estimate of the probability density function.
Number of Bins
Optional - Specify how many bins to use.
Enter a number to control the granularity of the histogram. More bins show more detail but may introduce noise; fewer bins show broader patterns.
Auto-calculated if not specified using Sturges' rule or Freedman-Diaconis rule.
Bin Size
Optional - Specify the width of each bin.
Alternative to "Number of Bins". Sets a fixed width for bins (e.g., bins of width 5 for ages: 0-5, 5-10, 10-15, etc.).
Cumulative
Optional - Show cumulative distribution.
Instead of showing frequency in each bin, shows cumulative frequency up to that bin.
Normalization
Optional - How to normalize the histogram.
Options:
- None - Show raw counts
- Probability - Normalize so bars sum to 1
- Probability Density - Normalize to show probability density
- Percent - Show as percentages
Histogram Function
Optional - Statistical function to apply.
Options:
- Count - Number of observations (default)
- Sum - Sum of values
- Average - Mean of values
- Min - Minimum value
- Max - Maximum value
Bar Mode
Optional - How to display multiple histograms.
Options:
- Overlay - Overlay histograms with transparency
- Group - Place bars side-by-side
- Stack - Stack bars on top of each other
Opacity
Optional - Transparency of bars (0-1).
Lower values make bars more transparent, useful when overlaying multiple distributions.
Understanding Distributions
Normal Distribution (Bell Curve)
Symmetric distribution with most values near the mean.
Characteristics:
- Symmetric around the mean
- Mean ≈ Median ≈ Mode
- 68% of data within 1 standard deviation
- 95% within 2 standard deviations
Right-Skewed Distribution
Long tail on the right side.
Characteristics:
- Mean > Median
- Common in: income data, response times, sizes
- May need log transformation for analysis
Left-Skewed Distribution
Long tail on the left side.
Characteristics:
- Mean < Median
- Less common than right-skewed
- Example: test scores (when most score high)
Bimodal Distribution
Two distinct peaks.
Characteristics:
- Two modes (peaks)
- Suggests two different groups or processes
- Consider separating and analyzing groups individually
Uniform Distribution
Approximately equal frequency across bins.
Characteristics:
- Flat appearance
- All values equally likely
- Example: random number generators
Tips for Effective Histograms
-
Choose Appropriate Bins:
- Too few bins hide important features
- Too many bins create noise
- Start with auto-calculated bins, then adjust
-
Consider Bin Width:
- Use meaningful intervals (e.g., $10,000 for income, 5 years for age)
- Ensure bins don't hide important patterns
-
Handle Outliers:
- Outliers can compress the main distribution
- Consider filtering extreme values or using log scale
- Or show outliers separately
-
Compare Distributions:
- Use overlay mode with transparency
- Or use small multiples (facets)
- Normalize when counts differ greatly
-
Add Context:
- Show mean/median lines
- Add KDE for smooth overview
- Annotate important bins
-
Check for Artifacts:
- Gaps might indicate data collection issues
- Spikes might indicate rounding or discrete values
- Verify patterns make domain sense
Troubleshooting
Issue: Distribution looks choppy or irregular
- Solution: Increase number of bins or use KDE for smoother view
Issue: Can't see the pattern
- Solution: Try log scale, adjust bin size, or filter outliers
Issue: Multiple distributions hard to compare
- Solution: Use normalization (probability or percent) so heights are comparable
Issue: Bars are too thin or wide
- Solution: Adjust number of bins or specify custom bin size
Issue: Peak is cut off
- Solution: Check Y-axis range in advanced settings
Issue: Data appears discrete but using continuous bins
- Solution: Adjust bins to align with discrete values (e.g., integer ages)