Skip to main content

How to Create a Sankey Diagram in Excel, Python, and R

From basic concepts to advanced implementation, learn to build effective Sankey diagrams using popular tools. Discover the key components, best practices, and design principles that make flow visualizations compelling and insightful.
Jun 13, 2025  · 9 min read

The beauty of Sankey diagrams lies in their ability to simplify multi-stage systems. Instead of hunting through rows of data to find the largest energy losses or budget allocations, you can spot them instantly by looking for the thickest flows. This makes them useful for energy management, financial analysis, marketing funnel optimization, and any scenario where understanding the flow and transformation of resources matters more than precise numerical comparisons.

For those looking to expand your analytical capabilities beyond flow visualization, our Data Visualization in Power BI course and Data Visualization in Tableau course teach you to create professional dashboards and interactive reports using industry-leading business intelligence platforms.

What Is a Sankey Diagram?

A Sankey diagram is a specialized flow visualization where the width of connecting arrows represents the magnitude of flow between different stages, categories, or entities. Unlike traditional flowcharts that show process steps or bar charts that compare discrete values, Sankey diagrams excel at showing how quantities move, transform, or get distributed through a system.

Sankey diagram components shown. Image by Author.

The diagram above illustrates how a $100,000 annual budget flows through different categories. Notice how the Marketing allocation ($40,000) appears as a visibly thicker flow compared to R&D ($25,000), making the proportional differences immediately apparent.

History and evolution of Sankey diagrams

The first known Sankey diagram appeared in 1898 when Captain Matthew Henry Phineas Riall Sankey used it to show the energy efficiency of a steam engine. His diagram revealed that only a small portion of the fuel's energy actually contributed to useful work, with most being lost as waste heat.

the very first Sankey diagram

However, the concept of proportional flow visualization predates Captain Sankey. Charles Joseph Minard created what many consider the most famous flow diagram in 1869, depicting Napoleon's disastrous 1812 Russian campaign. Minard's diagram showed the army's diminishing size as it advanced into Russia and then retreated, with the line thickness representing the number of surviving soldiers.

Components of a Sankey diagram

Understanding the key elements of a Sankey diagram helps you both interpret existing ones and create your own effectively.

  • Nodes represent the categories, stages, or entities in your system. In our budget example, "Annual Budget," "Marketing," and "Digital Ads" are all nodes. Source nodes (like "Annual Budget") typically appear on the left, while target nodes (like "Digital Ads") appear on the right, though this can vary depending on your layout preferences.
  • Flows or links are the directional connectors between nodes, and their width is proportional to the value they represent. The thick orange flow from Annual Budget to Marketing represents $40,000, while the much thinner flow to Content represents only $5,000. This proportional width is the defining characteristic that makes Sankey diagrams so effective at highlighting differences in magnitude.
  • Values are the numerical data that determine each flow's width. These could represent money, energy, materials, people, or any quantifiable resource moving through your system. The diagram automatically calculates the appropriate width based on these values, ensuring visual accuracy.
  • Drop-offs are special flows that represent losses, waste, or resources that exit the system without reaching a target node. While our budget example doesn't show drop-offs, you might see them in energy diagrams showing heat loss or in marketing funnels showing customers who abandon the process.

How to Create a Sankey Diagram

Creating Sankey diagrams requires different approaches depending on your preferred tools and technical comfort level. We'll walk through the same budget allocation example using Excel, Python, and R, so you can choose the method that best fits your workflow and expertise.

Sankey diagram in Excel

Excel doesn't include a native Sankey chart type, which means you'll need to use a third-party add-in to create these visualizations. In my experience, ChartExpo is one of the most popular and user-friendly options.

ChartExpo interface and Sankey diagram preview. Image by Author.

Before creating the diagram, you'll need to structure your data in a source-target-value format where each row represents one flow connection. For our budget example, this means listing each budget allocation as a separate row with the source category, target category, and dollar amount.

The process is straightforward once you have ChartExpo installed. First, install the add-in from the Microsoft AppSource or through Excel's add-in marketplace. Then, select your data range including the headers and choose Sankey Chart from ChartExpo's visualization options.

The add-in automatically detects your source, target, and value columns based on your data structure. As shown in the interface above, ChartExpo provides a preview of your diagram along with options to Create Chart From Selection, customize the visualization, or export the finished chart for use in presentations or reports.

Sankey diagram in Python

Python offers excellent options for creating Sankey diagrams, with Plotly being the most recommended library due to its interactive capabilities and professional output quality. Using the same budget allocation example which we began with, we'll recreate that identical visualization through code.

Step 1: Data preparation

Start by organizing your data into the format Plotly expects. You'll need three main components: a list of node names, and arrays specifying the source indices, target indices, and values for each flow.

import plotly.graph_objects as go

# Define all nodes in your diagram
nodes = ["Annual Budget", "Marketing", "Operations", "R&D", 
         "Digital Ads", "Events", "Content", "Salaries", 
         "Office", "Utilities", "Software", "Equipment"]

# Define the connections (using node indices)
source_indices = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3]
target_indices = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
values = [40, 35, 25, 25, 10, 5, 20, 10, 5, 15, 10]

The indices correspond to positions in your nodes list, so source_indices = [0, 0, 0] means the first three flows start from "Annual Budget" (position 0).

Step 2: Basic Sankey creation

Create the core diagram structure using Plotly's Sankey object. The essential parameters are the node definitions and link specifications.

fig = go.Figure(data=[go.Sankey(
    node=dict(
        label=nodes,
        pad=15,
        thickness=20
    ),
    link=dict(
        source=source_indices,
        target=target_indices,
        value=values
    )
)])

This creates a functional Sankey diagram with default styling. The pad controls spacing between nodes, while thickness determines how wide the node rectangles appear.

Step 3: Styling and customization

Enhance your diagram with colors, improved layout, and professional formatting.

# Add colors and transparency
fig.update_traces(
    node_color=["#1f77b4", "#ff7f0e", "#2ca02c", "#d62728",
                "#ff9999", "#ff9999", "#ff9999", "#90ee90",
                "#90ee90", "#90ee90", "#ffcccb", "#ffcccb"],
    link_color=["rgba(255, 127, 14, 0.4)", "rgba(44, 160, 44, 0.4)",
                "rgba(214, 39, 40, 0.4)", "rgba(255, 127, 14, 0.6)",
                "rgba(255, 127, 14, 0.6)", "rgba(255, 127, 14, 0.6)",
                "rgba(44, 160, 44, 0.6)", "rgba(44, 160, 44, 0.6)",
                "rgba(44, 160, 44, 0.6)", "rgba(214, 39, 40, 0.6)",
                "rgba(214, 39, 40, 0.6)"]
)

# Update layout for better presentation
fig.update_layout(
    title="Annual Budget Allocation",
    font=dict(size=16, family="Arial Black", color="black"),
    width=900,
    height=600
)

Step 4: Display and export

Display your diagram and save it in various formats for different uses.

fig.show()  # Display in Jupyter notebook or browser

# Export options
fig.write_html("budget_sankey.html")  # Interactive web version
fig.write_image("budget_sankey.png")  # Static image

For web applications, you can integrate this directly into Dash apps, making your Sankey diagrams part of interactive dashboards. The resulting visualization matches exactly what we saw in the opening visual. We have a great code-along that teaches you how to Build Dashboards with Plotly and Dash so you can try this idea for yourself.

Sankey diagram in R

R provides excellent capabilities for creating Sankey diagrams through the networkD3 package, which creates interactive, web-ready visualizations. Using our familiar budget allocation data, we'll demonstrate how R can produce the same professional results with built-in interactivity features.

The networkD3 package is specifically designed for creating D3.js-powered network visualizations in R, including Sankey diagrams. This approach offers several advantages: automatic interactivity (hover effects, zooming), easy integration with R Markdown reports, and seamless export options for web deployment.

Step 1: Setup and data preparation

First, install and load the required packages, then structure your data in the format networkD3 expects.

# Install required packages (run once)
install.packages(c("networkD3", "dplyr"))

# Load libraries
library(networkD3)
library(dplyr)

# Create nodes dataframe
nodes <- data.frame(
  name = c("Annual Budget", "Marketing", "Operations", "R&D",
           "Digital Ads", "Events", "Content", "Salaries", 
           "Office", "Utilities", "Software", "Equipment")
)

# Create links dataframe (note: networkD3 uses 0-based indexing)
links <- data.frame(
  source = c(0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3),
  target = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11),
  value = c(40, 35, 25, 25, 10, 5, 20, 10, 5, 15, 10)
)

The key difference from Python is that R requires separate dataframes for nodes and links, with the links dataframe using zero-based indexing to reference node positions.

Step 2: Basic Sankey creation

Create your diagram using the sankeyNetwork() function with essential parameters.

# Create basic Sankey diagram
sankey_plot <- sankeyNetwork(
  Links = links,
  Nodes = nodes,
  Source = "source",
  Target = "target", 
  Value = "value",
  NodeID = "name",
  units = "K USD"
)

# Display the plot
Sankey_plot

This generates an interactive Sankey diagram where users can hover over flows to see exact values and drag nodes to reorganize the layout.

Step 3: Customization and styling

Enhance your diagram with colors, sizing, and professional formatting options.

# Advanced Sankey with customization
(sankey_advanced <- sankeyNetwork(
  Links = links,
  Nodes = nodes,
  Source = "source",
  Target = "target",
  Value = "value",
  NodeID = "name",
  units = "K USD",
  fontSize = 14,
  fontFamily = "Arial",
  nodeWidth = 30,
  nodePadding = 20,
  margin = list(top = 50, right = 50, bottom = 50, left = 50),
  height = 600,
  width = 900
))

Step 4: Export and integration options

R makes it easy to save your interactive diagrams in multiple formats and integrate them into reports.

# Save as HTML file
library(htmlwidgets)
saveWidget(sankey_advanced, "budget_sankey.html", selfcontained = TRUE)

# For R Markdown integration, simply include the plot object
# The diagram will render as an interactive widget in your document

# For static image export (optional - requires webshot2 package)
install.packages("webshot2")
library(webshot2)

webshot("budget_sankey.html", "budget_sankey.png", vwidth = 900, vheight = 600)

Interactive Sankey diagram created with R's networkD3 package. Image by Author. 

This resulting diagram provides the same visual insights as our Python and Excel versions, but with built-in interactivity that helps users explore the data more thoroughly.

Sankey Diagram Alternatives and Comparisons

Sankey diagrams work best when you have clear directional relationships between categories, where the magnitude of flow matters more than precise comparisons. However, several situations call for different visualization approaches.

When not to use Sankey diagrams

Avoid Sankey diagrams when there's no directional flow between your categories. If your data simply shows different groups or classifications without movement between them, bar charts or pie charts will communicate your message more clearly. For example, comparing market share across different companies doesn't involve flow, so a bar chart would be more appropriate.

Skip them when you need precise numerical comparisons. While Sankey diagrams effectively show relative magnitudes, the varying widths make it difficult for readers to extract exact values or make detailed comparisons. If stakeholders need to compare specific percentages or amounts accurately, tables or bar charts serve better.

Consider alternatives when your data becomes too complex and clutters the diagram. With more than 10-15 nodes or highly interconnected flows, Sankey diagrams can become visually overwhelming. The crossing lines and overlapping flows make it hard to follow individual paths through the system.

Choose simpler visualizations when your audience is unfamiliar with Sankey diagrams. Since they're less common than bar charts or line graphs, some audiences may focus more on understanding the format than interpreting your data. In presentations to general audiences, stick with familiar chart types unless the flow relationship is essential to your message.

Better alternatives for specific scenarios

Alluvial diagrams work better for categorical or time-based flows where you're tracking changes across multiple time periods or stages. While Sankey diagrams show quantities flowing through a system at one point in time, alluvial diagrams excel at showing how categorical data evolves. For example, tracking how voters move between political parties across multiple elections, or how students change majors throughout college, fits alluvial diagrams better than Sankey diagrams.

Parallel coordinate plots serve better for comparing multivariate data where you want to see patterns across multiple dimensions simultaneously. These work well when you have many variables for each data point and want to identify clusters or outliers. For instance, comparing cars across price, fuel efficiency, safety ratings, and performance metrics works better with parallel coordinates than trying to force these relationships into a flow format.

Bump charts handle rank changes over time more effectively than either Sankey or alluvial diagrams. When you're showing how different entities move up or down in rankings over time periods, bump charts clearly show the trajectory without the visual complexity of flows. Think of tracking how different companies' market positions change over quarters, or how sports teams move through league standings over seasons.

To learn more, read our Top 5 Business Intelligence Courses to Take on DataCamp blog post, which provides guidance on building expertise with the important BI tools.

Conclusion

Successful visualization depends on choosing the right tool for your specific situation. Use Sankey diagrams when directional flow relationships matter more than precise numerical comparisons, and when your audience needs to quickly identify the most significant flows in a system.

For readers interested in expanding beyond Sankey diagrams, our 10 Data Visualization Project Ideas for All Levels blog post provides hands-on project suggestions across different complexity levels to build your visualization portfolio. These projects help develop critical thinking skills and create tangible evidence of your data visualization capabilities.


Vinod Chugani's photo
Author
Vinod Chugani
LinkedIn

As an adept professional in Data Science, Machine Learning, and Generative AI, Vinod dedicates himself to sharing knowledge and empowering aspiring data scientists to succeed in this dynamic field.

FAQs

What's the difference between a Sankey diagram and a flowchart?

While flowcharts show process steps and decision points, Sankey diagrams specifically visualize the flow and quantity of resources, energy, or data between different stages. The width of the arrows in Sankey diagrams is proportional to the values being measured, whereas flowcharts focus on process logic rather than quantities.

What kind of data is best suited for Sankey diagrams?

Sankey diagrams work best with flow-based data that shows movement or transformation from one stage to another, such as energy distribution, website conversion funnels, supply chain flows, or budget allocations. They're not suitable for purely categorical data or datasets where there's no directional relationship between the variables.

What are some good online tools for creating Sankey diagrams without coding?

For users who prefer web-based solutions, SankeyMATIC offers a free, simple interface for basic diagrams, while Flourish provides more advanced features and interactivity for professional presentations. Google Charts and Highcharts are excellent for developers who want to embed Sankey diagrams in websites, and Visual Paradigm offers comprehensive diagramming capabilities as part of a broader business tool suite.

When should I avoid using a Sankey diagram?

Avoid Sankey diagrams when you need precise numerical comparisons (since flow widths can be hard to measure exactly), when your data has too many categories that would create visual clutter, or when there's no actual directional flow between your data points. Also consider simpler alternatives if your audience is unfamiliar with this visualization type, as the novelty might overshadow your message.

How do I handle negative values or losses in a Sankey diagram?

Sankey diagrams typically don't display negative values directly since arrow widths represent positive quantities. Instead, show losses as separate outgoing flows from nodes, or use drop-off flows that don't connect to target nodes to represent waste or lost resources.

What's the difference between Sankey and Alluvial diagrams?

Sankey diagrams focus on flow quantities at a single point in time, while alluvial diagrams show how categorical data changes across multiple time periods or stages. Alluvial diagrams are better for tracking migration, changes in categories, or evolution over time.

Topics

Learn with DataCamp

Course

Statistical Simulation in Python

4 hr
19K
Learn to solve increasingly complex problems using simulations to generate and analyze data.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Tutorial

Python Bar Plot: Master Basic and More Advanced Techniques

Create standout bar charts using Matplotlib, Seaborn, Plotly, Plotnine, and Pandas. Explore bar chart types, from simple vertical and horizontal bars to more complex grouped and stacked layouts.
Samuel Shaibu's photo

Samuel Shaibu

7 min

Tutorial

How to Create a Line Graph in Excel: A Step-by-Step Guide

Learn to create clear and informative line graphs in Excel. This guide walks you through each step, from setting up your data to customizing the graph for better readability.
Derrick Mwiti's photo

Derrick Mwiti

8 min

Tutorial

How to Make a Gantt Chart in Python with Matplotlib

Learn how to make a Gantt chart in Python with matplotlib and why such visualizations are useful.
Elena Kosourova's photo

Elena Kosourova

10 min

Tutorial

Python Seaborn Line Plot Tutorial: Create Data Visualizations

Discover how to use Seaborn, a popular Python data visualization library, to create and customize line plots in Python.
Elena Kosourova's photo

Elena Kosourova

12 min

Tutorial

Python Pie Chart: Build and Style with Pandas and Matplotlib

Learn how to build and enhance pie charts using Python’s Matplotlib and Pandas libraries. Discover practical code examples and essential design tips to create clear, readable visuals.
Javier Canales Luna's photo

Javier Canales Luna

7 min

Tutorial

How to Make a Pie Chart in Excel: The Steps Explained

Learn how to create, format, and customize Excel pie charts. Discover Excel pie chart options like doughnut charts, pie of pie charts, and exploded pie charts.
Oluseye Jeremiah's photo

Oluseye Jeremiah

8 min

See MoreSee More