Sankey Diagram
Visualize flow and relationships between nodes
Use me when you want to watch flows move between categories like rivers branching and merging. Thick ribbons for big flows, thin ones for trickles. Perfect for energy flows, budget allocations, user paths through your website, or any system where things move from source to destination. I make the invisible visible - like watching your money or resources actually flow through stages.
Overview
A Sankey diagram is a flow diagram where the width of arrows or connections is proportional to the flow quantity. It shows how resources, energy, money, or other quantities move from sources through intermediaries to destinations. The visual thickness of connections immediately reveals the magnitude of flows, making it easy to identify major paths and bottlenecks.
Best used for:
- Visualizing flow of resources, energy, or money
- Showing multi-step processes and transformations
- Network traffic and data flow analysis
- Customer journey and conversion paths
- Budget allocation and spending breakdown
- Material flow and supply chain visualization
Common Use Cases
Business & Finance
- Budget allocation (departments → projects → expenses)
- Revenue streams and profit distribution
- Customer acquisition funnel (source → channel → conversion)
- Product sales flow (category → subcategory → product)
- Cash flow and money movement
Energy & Environment
- Energy production and consumption
- Carbon emissions flow
- Water usage and distribution
- Material recycling and waste management
- Resource allocation
Web Analytics & User Flow
- Website navigation paths
- User journey through application
- Traffic sources to conversion
- Feature usage patterns
- Drop-off analysis
Options
Source
Required - Column indicating the starting point of flows.
Each unique value represents a node on the left side of the diagram. Flows originate from these nodes.
Target
Required - Column indicating the destination of flows.
Each unique value represents a node on the right side. Flows terminate at these nodes. Note: A node can be both a source and target (intermediate nodes).
Value/Flow
Required - Magnitude of flow between source and target.
Column
Select the numerical column representing flow quantity (e.g., amount, count, volume).
Aggregation Function
Choose how to aggregate flows:
Options:
- Sum - Total flow (most common)
- Mean - Average flow
- Count - Number of connections
- Median - Middle flow value
- Min - Minimum flow
- Max - Maximum flow
Color By (Optional)
Optional - Color flows by category.
When specified, flows are colored based on this categorical column, making it easy to distinguish different types of flows.
Settings
Hide Empty Values
Optional - Exclude flows with no data.
Hide Node Labels
Optional - Hide labels on nodes.
Useful when node names are long or when you want a cleaner visualization.
Orientation
Optional - Direction of flow.
Options:
- Horizontal - Flows left to right (default)
- Vertical - Flows top to bottom
Understanding Sankey Components
Nodes
- Rectangles: Represent categories, stages, or entities
- Height: Proportional to total flow through the node
- Position: Automatically arranged in layers
- Color: Can indicate category or be automatically assigned
Links (Flows)
- Width: Proportional to flow magnitude
- Color: Matches source node or custom by category
- Curvature: Shows direction of flow
- Transparency: Often semi-transparent to show overlaps
Layers
- Left to right: Represents progression or transformation
- Multiple layers: Intermediate steps in the flow
- Nodes can repeat: Same entity at different stages
Tips for Effective Sankey Diagrams
-
Data Structure:
- Each row represents one flow connection
- Source and target columns define connections
- Value column indicates flow magnitude
- Example: Source="Marketing", Target="Website", Value=1000
-
Simplify When Needed:
- Limit to 10-15 nodes for clarity
- Group small flows into "Other"
- Filter out minor connections below threshold
- Consider multiple diagrams for complex systems
-
Use Color Strategically:
- Color by source to track origins
- Color by category to distinguish flow types
- Use consistent colors across related visualizations
- Ensure accessibility (colorblind-friendly)
-
Orientation Choice:
- Horizontal: Traditional, good for time-based flows
- Vertical: Better for top-down hierarchies
- Match orientation to mental model of process
-
Handle Complex Flows:
- Break into multiple diagrams if too complex
- Focus on main flows first
- Use filtering to show different aspects
- Consider animation for temporal data
-
Label Strategically:
- Keep node names short and clear
- Use hover tooltips for details
- Show values on major flows
- Hide labels if too cluttered
Common Patterns
Simple Flow (2 Layers)
Sources → Destinations
Marketing → Website
Social → App
Email → DirectMulti-Stage Flow (3+ Layers)
Sources → Channels → Conversions → Revenue
Traffic Sources → Landing Pages → Actions → SalesConverging Flow
Multiple sources feeding into fewer destinations (consolidation).
Diverging Flow
Single source splitting into multiple destinations (distribution).
Circular Flow
Nodes that connect back to earlier stages (recycling, feedback loops).
Example Scenarios
Budget Allocation
Company Budget → Departments → Projects → Expenditures
Customer Journey
Traffic Source → Landing Page → Action → Conversion
Energy Flow
Production → Distribution → Consumption → Waste
Revenue Streams
Product Categories → Sales Channels → Customer Segments → Revenue
Troubleshooting
Issue: Diagram is too cluttered
- Solution: Reduce number of flows by filtering low-value connections, grouping minor categories into "Other", or splitting into multiple diagrams.
Issue: Nodes are overlapping
- Solution: Reduce number of nodes, increase plot height, or adjust node padding in advanced settings.
Issue: Can't see small flows
- Solution: Use logarithmic scaling (advanced), filter out large flows to see detail, or create separate diagram for small flows.
Issue: Node order is confusing
- Solution: Nodes are automatically ordered. You may need to rename nodes to control their position, or manually specify node order in data preparation.
Issue: Flows cross each other messily
- Solution: This is common with complex networks. Simplify by removing minor flows, grouping categories, or reorganizing data structure.
Issue: Colors don't distinguish flows
- Solution: Use "Color By" option to categorize flows by meaningful attribute. Ensure sufficient color contrast.
Issue: Labels are cut off or overlapping
- Solution: Enable "Hide Node Labels" and rely on hover tooltips, or increase plot size to accommodate labels.
Issue: Cannot trace flow path
- Solution: Use consistent naming between source and target. Hover over flows to highlight paths. Consider color coding by origin.