Track
openpyxl is an efficient Python library that enables you to read and write Excel files in the modern XML-based formats (.xlsx, .xlsm) introduced in Excel 2007, with full support for advanced features from Excel 2010 onward. It was originally developed as a Python port of PHPExcel and continues to be maintained by an active community of developers. The library excels at data automation, reporting, and formatting workflows, making it especially useful for users who need to work with Excel files but don't have Excel installed on their systems.
Many organizations use openpyxl for automating financial reports, generating charts from database exports, and applying formulas and formatting to marketing reports. This library offers fine-grained control over Excel files entirely through Python code.
What Is openpyxl?
openpyxl is designed specifically to provide native support for Excel 2010+ formats. The library works with several file formats:
-
.xlsx
- Excel workbook -
.xlsm
- Excel workbook with macros -
.xltx
- Excel template -
.xltm
- Excel template with macros
One major advantage of openpyxl is that it doesn't require Excel to be installed on your computer. This makes it ideal for server environments, automated reporting systems, and data processing pipelines. Several other Python libraries, including pandas, use openpyxl as their Excel interface, which demonstrates its reliability and functionality.
Installing and Importing openpyxl
Installing openpyxl is straightforward using pip:
pip install openpyxl
For enhanced security when working with files from untrusted sources, you can also install the optional defusedxml
package:
pip install defusedxml
To use openpyxl in your Python scripts, import the necessary classes:
from openpyxl import Workbook, load_workbook
Key Concepts and Terminology
To use openpyxl effectively, you need to understand the basic Excel structure:
- Workbook: The Excel file itself, containing one or more worksheets
- Worksheet: Individual tabs/sheets within a workbook
- Cell: Individual data points in a worksheet, identified by column letter and row number (e.g., "A1")
- Row: Horizontal line of cells, identified by numbers (1, 2, 3...)
- Column: Vertical line of cells, identified by letters (A, B, C...)
In openpyxl, you can reference cells using either:
-
Excel style references:
sheet[“A1”]
-
Row-column indexing:
sheet.cell(row=1, column=1)
(Note: openpyxl uses 1-based indexing, not 0-based)
Reading Excel Files with openpyxl
Here's a practical example showing how to read data from an Excel file:
Load and explore a workbook
from openpyxl import load_workbook
# Load the workbook - use read_only=True for large files
wb = load_workbook('sample.xlsx', read_only=False, data_only=False)
# data_only=True reads values instead of formulas
The optional parameters control how the workbook is loaded:
-
read_only=True
improves performance for large files but limits editing capabilities -
data_only=True
returns calculated values instead of formulas
Iterate through data
To process multiple cells efficiently, use the iteration methods:
# Iterate through rows
for row in sheet.iter_rows(min_row=1, max_row=5, values_only=True):
print(row) # Returns a tuple of values
# Iterate through columns
for column in sheet.iter_cols(min_col=1, max_col=3, values_only=True):
print(column) # Returns a tuple of values
# Access a range of cells
cell_range = sheet["A1:C5"]
for row in cell_range:
for cell in row:
print(cell.value)
Convert to Python data structures
You can easily transform Excel data into Python data structures:
# Convert worksheet data to a list of dictionaries
def sheet_to_dict(sheet):
data = []
headers = [cell.value for cell in sheet[1]] # First row as headers
for row in sheet.iter_rows(min_row=2, values_only=True):
row_data = {}
for key, value in zip(headers, row):
row_data[key] = value
data.append(row_data)
return data
# Example usage
data_dict = sheet_to_dict(sheet)
print(data_dict)
Writing Excel Files with openpyxl
Creating and modifying Excel files is equally straightforward with openpyxl.
Create a workbook and write basic data
from openpyxl import Workbook
# Create a new workbook
wb = Workbook()
sheet = wb.active
# Rename the sheet
sheet.title = "Data"
# Write values to cells
sheet["A1"] = "Name"
sheet["B1"] = "Age"
sheet["C1"] = "City"
# Add data
data = [
["Alice", 25, "New York"],
["Bob", 30, "San Francisco"],
["Charlie", 35, "Chicago"]
]
for row_idx, row_data in enumerate(data, start=2):
for col_idx, cell_value in enumerate(row_data, start=1):
sheet.cell(row=row_idx, column=col_idx, value=cell_value)
# Save the workbook
wb.save("new_workbook.xlsx")
Modify existing Excel files
from openpyxl import load_workbook
# Load an existing workbook
wb = load_workbook("existing_file.xlsx")
sheet = wb.active
# Modify cell values
sheet["D1"] = "Updated Data"
sheet.cell(row=5, column=2).value = 42
# Save to a new file (to preserve the original)
wb.save("modified_file.xlsx")
Append data dynamically
The .append()
method makes it easy to add new rows:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
# Add headers
sheet.append(["Date", "Product", "Quantity", "Price"])
# Add multiple rows of data
sales_data = [
["2025-04-01", "Laptop", 5, 1200],
["2025-04-01", "Mouse", 10, 25],
["2025-04-02", "Monitor", 3, 350]
]
for row in sales_data:
sheet.append(row)
wb.save("sales_report.xlsx")
Formatting and Styling in Excel
openpyxl offers extensive options for cell formatting and styling.
Add fonts, borders, and alignment
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, Border, Side, PatternFill, NamedStyle
wb = Workbook()
sheet = wb.active
# Create a header style
header_style = NamedStyle(name="header_style")
header_style.font = Font(bold=True, size=12, color="FFFFFF")
header_style.fill = PatternFill(fill_type="solid", start_color="366092")
header_style.alignment = Alignment(horizontal="center", vertical="center")
header_style.border = Border(
bottom=Side(border_style="medium", color="000000"),
left=Side(border_style="thin"),
right=Side(border_style="thin"),
top=Side(border_style="thin")
)
# Add the style to the workbook
wb.add_named_style(header_style)
# Apply the style to header row
headers = ["ID", "Product", "Category", "Price"]
for col_idx, header in enumerate(headers, start=1):
cell = sheet.cell(row=1, column=col_idx, value=header)
cell.style = "header_style"
# Basic styling without named styles
sheet["A2"].font = Font(bold=True, italic=True)
sheet["A3"].alignment = Alignment(horizontal="center")
sheet["A4"].border = Border(bottom=Side(border_style="thin"))
wb.save("styled_workbook.xlsx")
Conditional formatting
Conditional formatting allows you to apply styles based on cell values:
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill
from openpyxl.styles.differential import DifferentialStyle
from openpyxl.formatting.rule import Rule, ColorScaleRule
wb = Workbook()
sheet = wb.active
# Add sample data
for row in range(1, 11):
for col in range(1, 5):
sheet.cell(row=row, column=col, value=(row-1)*4 + col)
# Add color scale (green to red gradient)
color_scale = ColorScaleRule(
start_type="min", start_color="63BE7B", # Green
mid_type="percentile", mid_value=50, mid_color="FFEB84", # Yellow
end_type="max", end_color="F8696B" # Red
)
sheet.conditional_formatting.add("A1:D10", color_scale)
# Add a rule for values less than 5
red_text = Font(color="FF0000")
red_fill = PatternFill(start_color="FFCCCC", end_color="FFCCCC", fill_type="solid")
dxf = DifferentialStyle(font=red_text, fill=red_fill)
rule = Rule(type="cellIs", operator="lessThan", formula=["5"], dxf=dxf)
sheet.conditional_formatting.add("A1:D10", rule)
wb.save("conditional_format.xlsx")
Advanced Features
openpyxl supports many advanced Excel features that help with complex spreadsheet tasks.
Adding formulas
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
# Add data
for row in range(1, 6):
sheet.cell(row=row, column=1, value=row)
# Add formulas
sheet["B1"] = "=A1*2"
sheet["B2"] = "=A2*2"
sheet["C1"] = "=SUM(A1:A5)"
sheet["C2"] = "=AVERAGE(A1:A5)"
sheet["C3"] = "=COUNTIF(A1:A5,\">2\")"
wb.save("formulas.xlsx")
Note: When using load_workbook()
, formulas are not automatically calculated by openpyxl. To see calculated values, use load_workbook(“file.xlsx”, data_only=True)
.
Insert charts
openpyxl can create various chart types:
from openpyxl import Workbook
from openpyxl.chart import BarChart, LineChart, Reference
wb = Workbook()
sheet = wb.active
# Add data
months = ["Jan", "Feb", "Mar", "Apr", "May"]
values = [30, 45, 37, 50, 62]
for i, (month, value) in enumerate(zip(months, values), start=1):
sheet.cell(row=i, column=1, value=month)
sheet.cell(row=i, column=2, value=value)
# Create a bar chart
bar_chart = BarChart()
bar_chart.title = "Monthly Sales"
bar_chart.x_axis.title = "Month"
bar_chart.y_axis.title = "Sales"
# Define the data range
data = Reference(sheet, min_col=2, min_row=1, max_row=5)
categories = Reference(sheet, min_col=1, min_row=1, max_row=5)
# Add the data to the chart
bar_chart.add_data(data)
bar_chart.set_categories(categories)
# Add the chart to the worksheet
sheet.add_chart(bar_chart, "D1")
# Create a line chart on the same data
line_chart = LineChart()
line_chart.title = "Monthly Sales Trend"
line_chart.add_data(data)
line_chart.set_categories(categories)
sheet.add_chart(line_chart, "D15")
wb.save("charts.xlsx")
Insert Images
from openpyxl import Workbook
from openpyxl.drawing.image import Image
wb = Workbook()
sheet = wb.active
# Add an image
img = Image("logo.png")
# Resize the image (optional)
img.width = 150
img.height = 75
# Add the image to cell A1
sheet.add_image(img, "A1")
wb.save("with_image.xlsx")
Note: You need the Pillow library installed to work with images: pip install pillow
Sheet, Row, and Column Management
openpyxl provides functions to manage the structure of your Excel files:
Creating and removing sheets
from openpyxl import Workbook
wb = Workbook()
# Create new sheets
wb.create_sheet("Data")
wb.create_sheet("Summary", 0) # Add at the beginning
# Remove a sheet
wb.remove(wb["Sheet"]) # Remove the default sheet
print(wb.sheetnames) # ['Summary', 'Data']
wb.save("multiple_sheets.xlsx")
Renaming sheets
wb = Workbook()
sheet = wb.active
sheet.title = "Sales Report"
wb.save("renamed_sheet.xlsx")
Copying worksheets
from openpyxl import Workbook
wb = Workbook()
source = wb.active
source.title = "Original"
# Add some data to copy
source["A1"] = "Test Data"
# Create a copy
wb.copy_worksheet(source)
# The copied sheet will have "Copy of Original" name
wb.save("copied_sheet.xlsx")
Inserting and deleting rows and columns
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
# Add some data
for i in range(1, 6):
for j in range(1, 4):
sheet.cell(row=i, column=j, value=f"R{i}C{j}")
# Insert a row at position 2
sheet.insert_rows(2)
# Insert multiple rows
sheet.insert_rows(5, 3) # Insert 3 rows at position 5
# Insert a column at position 2
sheet.insert_cols(2)
# Delete rows
sheet.delete_rows(7, 2) # Delete 2 rows starting at row 7
# Delete columns
sheet.delete_cols(3) # Delete column C
wb.save("modified_structure.xlsx")
Freezing panes and adding filters
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
# Add headers
headers = ["ID", "Name", "Department", "Salary"]
for col_idx, header in enumerate(headers, start=1):
sheet.cell(row=1, column=col_idx, value=header)
# Add data
data = [
[1, "John Smith", "HR", 55000],
[2, "Jane Doe", "IT", 65000],
[3, "Bob Johnson", "Finance", 60000]
]
for row_idx, row_data in enumerate(data, start=2):
for col_idx, value in enumerate(row_data, start=1):
sheet.cell(row=row_idx, column=col_idx, value=value)
# Freeze the header row
sheet.freeze_panes = "A2"
# Add autofilter
sheet.auto_filter.ref = "A1:D4"
wb.save("freeze_and_filter.xlsx")
Limitations and Community Feedback
While openpyxl is feature-rich, users have noted some limitations:
- Limited support for named ranges
- Documentation examples can be sparse for advanced features
- Performance can be slow for very large files (though read-only mode helps)
Despite these limitations, openpyxl has strong community support. Issues can be reported on the GitLab tracker, and professional support is sometimes available. For more information and help, you can refer to:
Conclusion
openpyxl is a well-established library for Python-based Excel automation that offers complete support for reading, writing, and styling Excel files. It enables detailed control over spreadsheet output in automated workflows, from manipulating individual cells to creating charts and applying complex formatting.
For users who need to generate or analyze Excel files programmatically, particularly when building reports or processing tabular data, openpyxl provides an extensive toolkit. Its ability to work without requiring Excel installation makes it especially valuable for server environments and automated data pipelines.
Whether you're creating financial models, analyzing data, or generating reports, openpyxl gives you the tools to handle Excel files efficiently through Python code.
As an adept professional in Data Science, Machine Learning, and Generative AI, Vinod dedicates himself to sharing knowledge and empowering aspiring data scientists to succeed in this dynamic field.
Become a Python Developer
openpyxl FAQs
What file types does openpyxl support?
openpyxl supports several Excel formats, including .xlsx
, .xlsm
, .xltx
, and xltm
. It does not support the older .xls
format used by Excel 2003 and earlier.
Can openpyxl read and evaluate Excel formulas?
openpyxl can read and write formulas, but it does not evaluate them. To see formula results, the file must first be opened and saved in Excel.
Do I need Microsoft Excel installed to use openpyxl?
No, openpyxl works entirely in Python and does not require Excel to be installed on your machine.
Can I insert images into Excel with openpyxl?
Yes, but you must have the Pillow library installed, and the image file must exist on disk when calling add_image()
.
Is openpyxl suitable for large Excel files?
Yes, especially when using read_only=True
mode, which improves performance for reading large spreadsheets.