Skip to main content

Python Excel: A Guide With Examples

Learn how to read and import Excel files in Python, write data to these spreadsheets, and find the best packages to do this.
Updated Oct 30, 2025  · 15 min read

One limitation of Excel is its inability to handle large amounts of data. You can run into serious performance issues when trying to perform complex operations on a lot of data entries in Excel, especially if your formulas and macros are not optimized for performance.

Excel can also become very time-consuming if you need to perform repetitive tasks. For instance, if you need to replicate an analysis on multiple Excel files every week, you would have to open them manually and copy-paste the same formulas over and over.

Surveys show that 93% of Excel users find it time-consuming to consolidate spreadsheets and that employees spend approximately 12 hours each month just combining different Excel files.

These drawbacks can be solved by automating Excel workflows with Python. Tasks like spreadsheet consolidation, data cleaning, and predictive modeling can be done in minutes using a simple Python script that writes to an Excel file.

Excel users can also create a scheduler in Python that runs the script automatically at different time intervals, dramatically reducing the amount of human intervention required to perform the same task again and again.

In this article, I will show you step-by-step how to:

  • Use a library called Openpyxl to read and write Excel files using Python
  • Create arithmetic operations and Excel formulas in Python
  • Manipulate Excel worksheets using Python
  • Build visualizations in Python and save them to an Excel file
  • Format Excel cell colors and styles using Python

AI Upskilling for Beginners

Learn the fundamentals of AI and ChatGPT from scratch.
Learn AI for Free

Introduction to Openpyxl

Openpyxl is a Python library that allows users to read Excel files and write to them. 

This framework can help you write functions, format spreadsheets, create reports, and build charts directly in Python without even having to open an Excel application.

Furthermore, Openpyxl allows users to iterate through worksheets and perform the same analysis on multiple sets of data at the same time. 

This improves efficiency and allows for the automation of Excel workflows since users only need to perform the analysis on one worksheet and can replicate as many times as required.

How to Install Openpyxl

To install Openpyxl, simply open your command prompt or Powershell and type the following command:

$pip install openpyxl

You should see the following message appear indicating that the package is successfully installed:

Openpyxl insall success

Reading Excel Files in Python with Openpyxl

We will be using Kaggle’s Video Game Sales dataset in this tutorial. This dataset has been preprocessed by our team for the purpose of this tutorial, and you can download the modified version from this link. You can import Excel into Python by following the process below:

Loading the workbook

After downloading the dataset, import the Openpyxl library and load the workbook into Python:

import openpyxl 

wb = openpyxl.load_workbook('videogamesales.xlsx')

Now that the Excel file is loaded as a Python object, you need to tell the library which worksheet to access. There are two ways to do this:

The first method is to simply call the active worksheet, which is the first sheet in the workbook, using the following line of code:

ws = wb.active

Alternatively, if you know the name of the worksheet, you can also access it by its name. We will be using the “vgsales” sheet in this section of the tutorial:

ws = wb['vgsales']

Let’s now count the number of rows and columns in this worksheet:

print('Total number of rows: '+str(ws.max_row)+'. And total number of columns: '+str(ws.max_column))

The above code should render the following output:

Total number of rows: 16328. And total number of columns: 10

Now that we know the dimensions of the sheet, let’s move on and learn how to read data from the workbook. 

Reading data from a cell

Here is a screenshot of the active sheet that we will be working with in this section:

Video games sales data

To retrieve data from a specific cell with Openpyxl, you can type the cell’s value in like this:

print(f"The value in cell A1 is: {ws['A1'].value}")

You should get the following output:

The value in cell A1 is: Rank

Reading data from multiple cells

Now that we know how to read data from a specific cell, what if we wanted to print all cell values in a certain row of the spreadsheet?

To do this, you can write a simple `for loop` to iterate through all the values in a specific row:

values = [ws.cell(row=1,column=i).value for i in range(1,ws.max_column+1)]
print(values)

The code above will print out all the values in the first row:

['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']

Next, let’s try printing out multiple rows in a specific column.

We will create a for loop to render the first ten rows in the “Name” column as a list. We should get the names highlighted in the red box below:

Video games names column

data=[ws.cell(row=i,column=2).value for i in range(2,12)]
print(data)

The above code will generate the following output:

['Wii Sports', 'Super Mario Bros.', 'Mario Kart Wii', 'Wii Sports Resort', 'Pokemon Red/Pokemon Blue', 'Tetris', 'New Super Mario Bros.', 'Wii Play', 'New Super Mario Bros. Wii', 'Duck Hunt']

Finally, let’s print the first ten rows in a range of columns in the spreadsheet:

# reading data from a range of cells (from column 1 to 6)

my_list = list()

for value in ws.iter_rows(
    min_row=1, max_row=11, min_col=1, max_col=6, 
    values_only=True):
    my_list.append(value)
    
for ele1,ele2,ele3,ele4,ele5,ele6 in my_list:
    (print ("{:<8}{:<35}{:<10}
             {:<10}{:<15}{:<15}".format(ele1,ele2,ele3,ele4,ele5,ele6)))

The first ten rows of data in the first six columns should be displayed after you run the above code:

video games ranked

Writing to Excel Files with Openpyxl

Now that we know how to access and read data from Excel files, let’s learn how to write to them using Openpyxl.

Writing to a cell

There are two ways you can write to a file with Openpyxl.

First, you can access the cell directly using its key:

ws['K1'] = 'Sum of Sales'

An alternative is to specify the row and column position of the cell that you’d like to write to:

ws.cell(row=1, column=11, value = 'Sum of Sales')

Every time you write to an Excel file with Openpyxl, you need to save your changes with the following line of code or they will not be reflected in the worksheet:

wb.save('videogamesales.xlsx')

If your workbook is open when you try to save it, you will run into the following permission error:

Permission error

Make sure to close the Excel file before saving your changes. You can then open it again to make sure that the change is reflected in your worksheet:

Video games data with new column

Notice that a new column called “Sum of Sales” has been created in cell K1.

Creating a new column

Let’s now add the sum of sales in every region and write it to column K.

We will do this for the sales data in the first row:

row_position = 2
col_position = 7

total_sales = ((ws.cell(row=row_position, column=col_position).value)+
               (ws.cell(row=row_position, column=col_position+1).value)+
               (ws.cell(row=row_position, column=col_position+2).value)+
               (ws.cell(row=row_position, column=col_position+3).value))

ws.cell(row=2,column=11).value=total_sales
wb.save('videogamesales.xlsx')

Notice that the total sales have been calculated in cell K2 for the first game in the worksheet:

Sum of sales in cell K2

Similarly, let’s create a for loop to sum the sales values in every row:

row_position = 1

for i in range(1, ws.max_row):

    row_position += 1
    NA_Sales = ws.cell(row=row_position, column=7).value
    EU_Sales = ws.cell(row=row_position, column=8).value
    JP_Sales = ws.cell(row=row_position, column=9).value
    Other_Sales = ws.cell(row=row_position, column=10).value

    total_sales = (NA_Sales + EU_Sales + JP_Sales + Other_Sales)
    ws.cell(row=row_position, column=11).value = total_sales

wb.save("videogamesales.xlsx")

Your Excel file should now have a new column reflecting the total sales of video games in all regions:

Sum of sales calculated

Appending new rows

To append a new row to the workbook, simply create a tuple with the values you’d like to include and write it to the sheet:

new_row = (1,'The Legend of Zelda',1986,'Action','Nintendo',3.74,0.93,1.69,0.14,6.51,6.5)

ws.append(new_row)
    
wb.save('videogamesales.xlsx')

You can confirm that this data has been appended by printing the last row in the workbook:

values = [ws.cell(row=ws.max_row,column=i).value for i in range(1,ws.max_column+1)]
print(values)

The following output will be generated:

[1, 'The Legend of Zelda', 1986, 'Action', 'Nintendo', 3.74, 0.93, 1.69, 0.14, 6.51, 6.5]

Deleting rows

To delete the new row we just created, you can run the following line of code:

ws.delete_rows(ws.max_row, 1) # row number, number of rows to delete

wb.save('videogamesales.xlsx')

The first argument in the delete_rows() function is the row number you want to delete. The second argument indicates the number of rows that should be deleted.

Creating Excel Formulas with Openpyxl

You can use Openpyxl to write formulas exactly like you would in Excel. Here are some examples of basic functions you can create using Openpyxl:

AVERAGE()

Let’s create a new column called “Average Sales” to calculate the average total video game sales in all markets:

ws['P1'] = 'Average Sales'
ws['P2'] = '= AVERAGE(K2:K16220)'

wb.save('videogamesales.xlsx')

The average sales across all markets is approximately 0.19. This will be printed in cell P2 of your worksheet.

COUNTA()

The “COUNTA” function in Excel counts cells that are populated within a specific range. Let’s use it to find the number of records between E2 and E16220:

ws['Q1'] = "Number of Populated Cells" 
ws['Q2'] = '=COUNTA(E2:E16220)'

wb.save('videogamesales.xlsx')

There are 16,219 records in this range that have information in them.

COUNTIF()

COUNTIF() is a commonly used Excel function that counts the number of cells that meet a specific condition. Let’s use it to count the number of games in this dataset with the “Sports” genre:

ws['R1'] = 'Number of Rows with Sports Genre'
ws['R2'] = '=COUNTIF(E2:E16220, "Sports")'

wb.save('videogamesales.xlsx')

There are 2,296 sports games in the dataset.

SUMIF()

Now, let’s find the total “Sum of Sales” generated by sports games using the SUMIF function:

ws['S1'] = 'Total Sports Sales'
ws['S2'] = '=SUMIF(E2:E16220, "Sports",K2:K16220)'
 
wb.save('videogamesales.xlsx')

The total number of sales generated by sports games is 454.

CEILING()

The CEILING() function in Excel rounds a number up to the nearest specified multiple. Let’s round up the total amount of sales generated by sports games using this function:

ws['T1'] = 'Rounded Sum of Sports Sales'
ws['T2'] = '=CEILING(S2,25)'

wb.save('videogamesales.xlsx')

We have rounded the total sales generated by sports games to the nearest multiple of 25, which returns a result of 475.

The code snippets above should generate the following output in your Excel sheet (from cells P1 to T2):

Average sales row

You can refer to our Excel Basics Cheat Sheet to learn more about Excel formulas, operators, math functions, and conditional computation.

Working with Sheets in Openpyxl

Now that we know how to access worksheets and write to them, let’s learn how we can manipulate, remove, and duplicate them using Openpyxl.

Changing sheet names

First, let’s print the name of the active sheet we are currently working with using Openpyxl’s title attribute:

print(ws.title)

The following output will be rendered:

vgsales

Now, let’s rename this worksheet using the following lines of code:

ws.title ='Video Game Sales Data'

wb.save('videogamesales.xlsx')

The name of your active sheet should now be changed to “Video Game Sales Data.”

Creating a new worksheet

Run the following line of code to list all the worksheets in the workbook:

print(wb.sheetnames)

You will see an array listing the names of all the worksheets in the file:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year']

Let’s now create a new empty worksheet:

wb.create_sheet('Empty Sheet') # create an empty sheet
print(wb.sheetnames) # print sheet names again

wb.save('videogamesales.xlsx')

Notice that a new sheet called “Empty Sheet” has now been created:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year', ‘Empty Sheet’]

Deleting a worksheet

To delete a worksheet using Openpyxl, simply use the remove attribute and print all the sheet names again to confirm that the sheet has been deleted:

wb.remove(wb['Empty Sheet'])
print(wb.sheetnames)

wb.save('videogamesales.xlsx')

Notice that the “Empty Sheet” worksheet is no longer available:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year']

Duplicating a worksheet

Finally, run this line of code to create a copy of an existing worksheet:

wb.copy_worksheet(wb['Video Game Sales Data'])
wb.save('vgsales_2.xlsx')

Printing all the sheet names again, we get the following output:

['Video Game Sales Data', 'Total Sales by Genre', 'Breakdown of Sales by Genre', 'Breakdown of Sales by Year', 'Video Game Sales Data Copy']

Adding Charts to an Excel File with Openpyxl

Excel is often seen as the go-to tool for creating visualizations and summarizing datasets. In this section, we will learn to build charts in Excel directly from Python using Openpyxl.

Bar chart

Let’s first create a simple bar chart displaying the total sales of video games by genre. We will be using the “Total Sales by Genre” worksheet to do this:

Worksheet for bar chart

This worksheet contains a pivot table in which the sum of sales has been aggregated by genre, as seen in the screenshot above.

Let’s access this worksheet before we start creating the bar chart:

ws = wb['Total Sales by Genre'] # access the required worksheet

Now, we need to tell Openpyxl the values and categories that we’d like to plot.

Values:

The values include the “Sum of Sales” data that we want to plot. We need to tell Openpyxl where to find this data in the Excel file by including the range in which your values begin and end.

Four parameters in Openpyxl allow you to specify where your values are located:

  • Min_column: The minimum column containing data

  • Max_column: The maximum column containing data

  • Min_row: The minimum row containing data

  • Max_row: The maximum row containing data

Here is an image showcasing how you can define these parameters:

Four parameters for defining values

Notice that the minimum row is the first row and not the second one. This is because Openpyxl starts counting from the row that has a numeric value in it.

# Values for plotting

from openpyxl.chart import Reference

values = Reference(ws,         # worksheet object   
                   min_col=2,  # minimum column where your values begin
                   max_col=2,  # maximum column where your values end
                   min_row=1,  # minimum row you’d like to plot from
                   max_row=13) # maximum row you’d like to plot from

Categories

Now, we need to define the same parameters for the categories in our bar chart:

Parameters for bar chart categories

Here is the code you can use to set parameters for the chart’s categories:

cats = Reference(ws, 
                 min_col=1, 
                 max_col=1, 
                 min_row=2, 
                 max_row=13)

Creating the bar chart

Now, we can create the bar chart object and include our values and categories using the following lines of code:

from openpyxl.chart import BarChart

chart = BarChart()
chart.add_data(values, titles_from_data=True)
chart.set_categories(cats)

Set chart titles

Finally, you can set the chart titles and tell Openpyxl where you’d like to create it in the Excel sheet:

# set the title of the chart
chart.title = "Total Sales"

# set the title of the x-axis
chart.x_axis.title = "Genre"

# set the title of the y-axis
chart.y_axis.title = "Total Sales by Genre"

# the top-left corner of the chart
# is anchored to cell F2 .
ws.add_chart(chart,"D2")

# save the file 
wb.save("videogamesales.xlsx")

You can then open the Excel file and navigate to the “Total Sales by Genre” worksheet. You should notice a chart that looks like this appear:

Total sales by genre

Grouped bar chart

Now, let’s create a grouped bar chart displaying total sales by genre and region. You can find the data for this chart in the “Breakdown of Sales by Genre” worksheet:

Table for grouped bar chart

Similar to how we created the bar chart, we need to define the range for values and categories:

Vales and categories definition for grouped bar chart

We can now access the worksheet and write this down in code:

### Creating a Grouped Bar Chart with Openpyxl
ws = wb['Breakdown of Sales by Genre'] # access worksheet

# Data for plotting
values = Reference(ws,
                   min_col=2,
                   max_col=5,
                   min_row=1,
                   max_row=13)

cats = Reference(ws, min_col=1, 
                     max_col=1, 
                     min_row=2, 
                     max_row=13)

We can now create the bar chart object, include the values and categories in it, and set the title parameters exactly like we did previously:

# Create object of BarChart class
chart = BarChart()
chart.add_data(values, titles_from_data=True)
chart.set_categories(cats)

# set the title of the chart
chart.title = "Sales Breakdown"

# set the title of the x-axis
chart.x_axis.title = "Genre"

# set the title of the y-axis
chart.y_axis.title = "Breakdown of Sales by Genre"

# the top-left corner of the chart is anchored to cell H2.
ws.add_chart(chart,"H2")

# save the file 
wb.save("videogamesales.xlsx")

Once you open the worksheet, a grouped bar chart that looks like this should appear:

Grouped bar chart output

Stacked line chart

Finally, we will create a stacked line chart using data in the tab “Breakdown of Sales by Year.” This worksheet contains video game sales data broken down by year and region:

Sum of sales data

Let’s define the range for this chart’s values and categories:

Values and categories for stacked line chart

We can now write these minimum and maximum values in code:

# Data for plotting
values = Reference(ws,
                   min_col=2,
                   max_col=6,
                   min_row=1,
                   max_row=40)

cats = Reference(ws, min_col=1, max_col=1, min_row=2, max_row=40)

Finally, let’s create the line chart object and set the graph’s title, x-axis, and y-axis:

# Create object of LineChart class
from openpyxl.chart import LineChart

chart = LineChart()
chart.add_data(values, titles_from_data=True)
chart.set_categories(cats)

# set the title of the chart
chart.title = "Total Sales"

# set the title of the x-axis
chart.x_axis.title = "Year"

# set the title of the y-axis
chart.y_axis.title = "Total Sales by Year"

# the top-left corner of the chart is anchored to cell H2 
ws.add_chart(chart,"H2")

# save the file 
wb.save("videogamesales.xlsx")

A stacked line chart that looks like this should appear on your worksheet:

Stacked line chart

Formatting Cells Using Openpyxl

Openpyxl allows users to style cells in Excel workbooks. You can make your spreadsheet look prettier by changing font sizes, background colors, and cell borders directly in Python.

Here are some ways to customize your Python Excel spreadsheet using Openpyxl:

Changing font sizes and styles

Let’s increase the font size in cell A1 and bold the text using the following lines of code:

from openpyxl.styles import Font

ws = wb['Video Game Sales Data']
ws['A1'].font = Font(bold=True, size=12)

wb.save('videogamesales.xlsx')

Notice that the text in cell A1 is now slightly larger and bolded:

large font data

Now, what if we wanted to change the font size and style for all the column headers in the first row?

To do this, we can use the same code and simply create a for loop to iterate through all the columns in the first row:

for row in ws.iter_rows(min_row=1, max_row=1):
    for cell in row:
        cell.font = Font(bold=True, size=12)

wb.save('videogamesales.xlsx')

When we iterate through [“1:1”], we are telling Openpyxl the start and end rows to loop through. If we wanted to loop through the first ten rows, for instance, then we’d specify [“1:10”] instead.

You can open the Excel sheet to check if the changes have been reflected:

smaller font data

Changing font color

You can change font colors in Openpyxl using hex codes:

from openpyxl.styles import colors

ws['A1'].font = Font(color = 'FF0000',bold=True, size=12) ## red
ws['A2'].font = Font(color = '0000FF') ## blue

wb.save('videogamesales.xlsx')

After saving the workbook and opening it again, the font colors in cells A1 and A2 should have changed:

change font color

Changing cell background color

To change the background color of a cell, you can use Openpyxl’s PatternFill module:

## changing background color of a cell
from openpyxl.styles import PatternFill

ws["A1"].fill = PatternFill('solid', start_color="38e3ff") # light blue background color
wb.save('videogamesales.xlsx')

The following change should be reflected in your worksheet:

cell color change

Adding cell borders

To add a cell border using Openpyxl, run the following lines of code:

## cell borders
from openpyxl.styles import Border, Side

my_border = Side(border_style="thin", color="000000")

ws["A1"].border = Border(
    top=my_border, left=my_border, right=my_border, bottom=my_border
)

wb.save("videogamesales.xlsx")

You should see a border that looks like this appear across cell A1:

change cell color

Conditional formatting

Conditional formatting is the process of highlighting specific values in an Excel file based on a set of conditions. It allows users to visualize data more easily and better understand the values in their worksheets.

Let’s use Openpyxl to highlight in green all the video game sales values that are greater than or equal to 8:

from openpyxl.formatting.rule import CellIsRule

fill = PatternFill(
    start_color='90EE90',
    end_color='90EE90',fill_type='solid') # specify background color

ws.conditional_formatting.add(
    'G2:K16594', CellIsRule(operator='greaterThan', formula=[8], fill=fill)) # include formatting rule

wb.save('videogamesales.xlsx')

In the first block of code, we specify the background color of the cells we’d like to format. In this case, the color is light green.

Then, we create a conditional formatting rule stating that any value greater than 8 should be highlighted in the fill color we specified. We also state the range of cells in which we’d like to apply this condition.

After running the above code, all the sales values above 8 should be highlighted like this:

conditional formatted

Conclusion

We have covered a lot of ground in this tutorial, starting from the basics of using the Openpyxl library to performing more advanced operations like creating charts and formatting spreadsheets in Python.

On their own, Python and Excel are powerful data manipulation tools used to build predictive models, produce analytical reports, and perform mathematical calculations.

The biggest advantage of Excel is that it is used by almost everybody. From non-technical stakeholders to entry-level individuals, employees of all levels understand reports presented in an Excel spreadsheet. 

Python, on the other hand, is used to analyze and build models on large amounts of data. It can help teams automate laborious tasks and improve organizational efficiency.

When Excel and Python are used together, it can shave hours off a company’s workflows while still sticking to an interface that everyone in the organization is familiar with.

FAQs

How does Python handle large Excel datasets more efficiently than Excel itself?

Python can handle large datasets more effectively by leveraging libraries such as pandas and numpy, which are optimized for performance. Unlike Excel, Python doesn’t rely on a graphical interface, allowing it to process millions of rows in-memory and perform advanced operations without the risk of crashing or significant slowdowns.

Can Python be used to work with non-Excel file formats and convert them to Excel?

Yes, Python can work with various file formats, such as CSV, JSON, and databases. Using libraries like pandas, you can read these formats and export them to Excel files with ease. For example:

import pandas as pd
data = pd.read_csv('data.csv')  # Load CSV file
data.to_excel('data.xlsx', index=False)  # Save as Excel

Is it possible to combine data from multiple Excel files using Python?

Yes, Python allows you to consolidate data from multiple Excel files into a single file or worksheet. This can be done using libraries like openpyxl or pandas. For instance, with pandas, you can read multiple files into dataframes, merge or concatenate them, and save the result back to an Excel file:

import pandas as pd
df1 = pd.read_excel('file1.xlsx')
df2 = pd.read_excel('file2.xlsx')
combined = pd.concat([df1, df2])
combined.to_excel('combined.xlsx', index=False)

Earn a Top AI Certification

Demonstrate you can effectively and responsibly use AI.

Natassha Selvaraj's photo
Author
Natassha Selvaraj
LinkedIn
Twitter

Natassha is a data consultant who works at the intersection of data science and marketing. She believes that data, when used wisely, can inspire tremendous growth for individuals and organizations. As a self-taught data professional, Natassha loves writing articles that help other data science aspirants break into the industry. Her articles on her personal blog, as well as external publications garner an average of 200K monthly views.

Topics

Learn more about Python and Spreadsheets

Course

Introduction to Importing Data in Python

3 hr
324.1K
Learn to import data into Python from various sources, such as Excel, SQL, SAS and right from the web.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat-sheet

Importing Data in Python Cheat Sheet

With this Python cheat sheet, you'll have a handy reference guide to importing your data, from flat files to files native to other software and relational databases.
Karlijn Willems's photo

Karlijn Willems

Tutorial

openpyxl: Automate Excel Tasks with Python

Learn how to read, write, and style Excel files using openpyxl. Discover practical examples for working with modern Excel formats (.xlsx) and advanced features.
Vinod Chugani's photo

Vinod Chugani

Tutorial

Using Python to Power Spreadsheets in Data Science

Learn how Python can be used more effectively than Excel, with the Pandas package.
Jason Graham's photo

Jason Graham

Tutorial

Python in Excel: Advanced Analysis in Your Excel Spreadsheet

Learn how to use Python in Excel to run advanced analytics, create visuals, and apply machine learning within your existing spreadsheet workflows.
Allan Ouko's photo

Allan Ouko

Tutorial

How to Analyze Data in Google Sheets With Python: A Step-By-Step Guide

Boost your data analysis skills with our step-by-step guide on how to analyze, manipulate and write back data in Google Sheets using Python.
Filip Schouwenaars's photo

Filip Schouwenaars

Tutorial

VBA Excel: How to Get Started and Make Your Work Easier

Learn how to effectively use VBA in Excel to automate tasks, create macros, and enhance your data processing skills with practical examples and best practices.
Laiba Siddiqui's photo

Laiba Siddiqui

See MoreSee More