Skip to main content

Finding the Size of a DataFrame in Python

There are several ways to find the size of a DataFrame in Python to fit different coding needs. Check out this tutorial for a quick primer on finding the size of a DataFrame. This tutorial presents several ways to check DataFrame size, so you’re sure to find a way that fits your needs.
Feb 14, 2024  · 5 min read

DataFrames are a widely used data type in Python scripts. Understanding the size of a DataFrame in Python is crucial for many purposes, including determining how much memory allocation will be needed when using the DataFrame and ensuring your script does not try to call an element outside the bounds of the DataFrame. Fortunately, there are several ways to find the size of a DataFrame in Python, allowing a Python programmer to use different methods to accommodate different coding styles and situations.

Let’s discuss how to find the size of a DataFrame in Python.

Understanding Python DataFrames

DataFrames are a way of organizing information in Python that is very common in data science. There are a few key components that make DataFrames exceptionally useful in data projects.

Firstly, the information in DataFrames is organized like a table, which is easy to read and understand. Secondly, the information is mutable, which means elements in the DataFrame can be changed after creation. You can easily add new elements or update or remove existing elements within a DataFrame.

DataFrames are also useful for their ordering. Elements are kept in the DataFrame in the same order that they are added unless explicitly changed, such as by sorting.

Lastly, DataFrames contain an index, starting from 0, which allows you to select an individual element based on its position within the DataFrame.

You can learn more about DataFrames in DataCamp’s data manipulation with pandas course or this Python pandas tutorial.

Python DataFrame Size: Using df.shape in Pandas for general use

Python pandas is a library that allows analysts to easily work with DataFrames. This library has a straightforward shape method used to find the size of a DataFrame.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 22],
                   'City': ['New York', 'San Francisco', 'Los Angeles']})

# Using shape to get the size
rows, columns = df.shape
print(f"Number of rows: {rows}, Number of columns: {columns}")
Output: Number of rows: 3, Number of columns: 3

The df.shape method provides information about the number of rows and columns in a DataFrame quickly and easily.

Key takeaway: df.shape is your go-to function for finding the size of a DataFrame.

Using len() for row number only

One of the simplest and most commonly used methods to find the length of a list, the built-in len() function can also be used to find the number of rows in a DataFrame. This method is concise and efficient. However, it provides limited information compared to the df.shape function.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 22],
                   'City': ['New York', 'San Francisco', 'Los Angeles']})

# Using len to get the number of rows
num_rows = len(df)
print(f"Number of rows: {num_rows}")
Output: Number of rows: 3

When it comes to checking the length of a list in Python, len() is rarely used compared with df.shape. However, it can be a quick way to examine the number of rows in a DataFrame without the pandas library.

Key takeaway: len() is a built-in function to use as an alternative to pandas.

df.info() for more detailed information

For situations where a more detailed measure of size is required, try pandas’ df.info() method. This approach provides you with the number of rows and columns in the DataFrame, as well as information about the data type in each column and the number of null values.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 22],
                   'City': ['New York', 'San Francisco', 'Los Angeles']})

# Using info to get information about the DataFrame
df.info()
Output: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object

dtypes: int64(1), object(2)
memory usage: 204.0+ bytes

With this method, the number of rows is listed under RangeIndex. In the example above, it shows that there are three rows (called entries here) and that the index starts at 0 and ends at 2. The number of columns is listed underneath. Following these, each column’s name is listed along with the number of non-null entries in each column and its data type.

Key takeaway: df.info() can provide more detailed information about a DataFrame.

Python DataFrame Size Best Practices and Tips

When finding the size of a DataFrame in Python, there are a few best practices to keep in mind.

  1. Choose the method that works best for your DataFrame. Remember, df.shape quickly gives the number of rows and columns, while df.info gives extra information you may or may not need for your purpose.
  2. Make sure you have installed and imported any libraries you need. The pandas library is a staple when working with DataFrames.
  3. Document your work well. Make sure to use descriptive comments so future coders can decipher what you did and why.

Conclusion

There are multiple ways to find the size of a DataFrame in Python, depending on your preferences and code requirements. Whether you need simplicity or detailed insights, there's an approach that suits your specific needs.

Always consider the nature of your data and the insights you aim to derive when determining which approach to use. To learn more ways to use Python DataFrames, check out DataCamp’s Introduction to Python course or the Intermediate Python for Finance course. Or try out DataCamp’s data scientist in Python career track.

You can also check out polars, a newer competitor to pandas for high-performance DataFrame analysis. You can read more about the difference between pandas and polars or discover an introduction to using polars.


Photo of Amberle McKee
Author
Amberle McKee
LinkedIn

I am a PhD with 13 years of experience working with data in a biological research environment. I create software in several programming languages including Python, MATLAB, and R. I am passionate about sharing my love of learning with the world.

Topics

Keep Learning Python! 

track

Python Data Fundamentals

30hrs hr
Grow your data skills, discover how to manipulate and visualize data, and apply advanced analytics to make data-driven decisions.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat-sheet

Reshaping Data with pandas in Python

Pandas DataFrames are commonly used in Python for data analysis, with observations containing values or variables related to a single object and variables representing attributes across all observations.
Richie Cotton's photo

Richie Cotton

cheat-sheet

Pandas Cheat Sheet for Data Science in Python

A quick guide to the basics of the Python data analysis library Pandas, including code samples.
Karlijn Willems's photo

Karlijn Willems

4 min

tutorial

Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.
Karlijn Willems's photo

Karlijn Willems

20 min

tutorial

Python List Size: 8 Different Methods for Finding the Length of a List in Python

Compare between 8 different methods for finding the length of a list in Python.
Adel Nehme's photo

Adel Nehme

8 min

tutorial

Python Functions: How to Call & Write Functions

Discover how to write reusable and efficient Python functions. Master parameters, return statements, and advanced topics like lambda functions. Organize your code better with main() and other best practices.
Karlijn Willems's photo

Karlijn Willems

14 min

tutorial

Pandas Add Column Tutorial

You are never stuck with just the data you are given. Instead, you can add new columns to a DataFrame.
DataCamp Team's photo

DataCamp Team

4 min

See MoreSee More