Skip to main content
HomeTutorialsPython

Finding the Size of a DataFrame in Python

There are several ways to find the size of a DataFrame in Python to fit different coding needs. Check out this tutorial for a quick primer on finding the size of a DataFrame. This tutorial presents several ways to check DataFrame size, so you’re sure to find a way that fits your needs.
Feb 2024  · 5 min read

DataFrames are a widely used data type in Python scripts. Understanding the size of a DataFrame in Python is crucial for many purposes, including determining how much memory allocation will be needed when using the DataFrame and ensuring your script does not try to call an element outside the bounds of the DataFrame. Fortunately, there are several ways to find the size of a DataFrame in Python, allowing a Python programmer to use different methods to accommodate different coding styles and situations.

Let’s discuss how to find the size of a DataFrame in Python.

Understanding Python DataFrames

DataFrames are a way of organizing information in Python that is very common in data science. There are a few key components that make DataFrames exceptionally useful in data projects.

Firstly, the information in DataFrames is organized like a table, which is easy to read and understand. Secondly, the information is mutable, which means elements in the DataFrame can be changed after creation. You can easily add new elements or update or remove existing elements within a DataFrame.

DataFrames are also useful for their ordering. Elements are kept in the DataFrame in the same order that they are added unless explicitly changed, such as by sorting.

Lastly, DataFrames contain an index, starting from 0, which allows you to select an individual element based on its position within the DataFrame.

You can learn more about DataFrames in DataCamp’s data manipulation with pandas course or this Python pandas tutorial.

Python DataFrame Size: Using df.shape in Pandas for general use

Python pandas is a library that allows analysts to easily work with DataFrames. This library has a straightforward shape method used to find the size of a DataFrame.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 22],
                   'City': ['New York', 'San Francisco', 'Los Angeles']})

# Using shape to get the size
rows, columns = df.shape
print(f"Number of rows: {rows}, Number of columns: {columns}")
Output: Number of rows: 3, Number of columns: 3

The df.shape method provides information about the number of rows and columns in a DataFrame quickly and easily.

Key takeaway: df.shape is your go-to function for finding the size of a DataFrame.

Using len() for row number only

One of the simplest and most commonly used methods to find the length of a list, the built-in len() function can also be used to find the number of rows in a DataFrame. This method is concise and efficient. However, it provides limited information compared to the df.shape function.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 22],
                   'City': ['New York', 'San Francisco', 'Los Angeles']})

# Using len to get the number of rows
num_rows = len(df)
print(f"Number of rows: {num_rows}")
Output: Number of rows: 3

When it comes to checking the length of a list in Python, len() is rarely used compared with df.shape. However, it can be a quick way to examine the number of rows in a DataFrame without the pandas library.

Key takeaway: len() is a built-in function to use as an alternative to pandas.

df.info() for more detailed information

For situations where a more detailed measure of size is required, try pandas’ df.info() method. This approach provides you with the number of rows and columns in the DataFrame, as well as information about the data type in each column and the number of null values.

import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 22],
                   'City': ['New York', 'San Francisco', 'Los Angeles']})

# Using info to get information about the DataFrame
df.info()
Output: 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   City    3 non-null      object

dtypes: int64(1), object(2)
memory usage: 204.0+ bytes

With this method, the number of rows is listed under RangeIndex. In the example above, it shows that there are three rows (called entries here) and that the index starts at 0 and ends at 2. The number of columns is listed underneath. Following these, each column’s name is listed along with the number of non-null entries in each column and its data type.

Key takeaway: df.info() can provide more detailed information about a DataFrame.

Python DataFrame Size Best Practices and Tips

When finding the size of a DataFrame in Python, there are a few best practices to keep in mind.

  1. Choose the method that works best for your DataFrame. Remember, df.shape quickly gives the number of rows and columns, while df.info gives extra information you may or may not need for your purpose.
  2. Make sure you have installed and imported any libraries you need. The pandas library is a staple when working with DataFrames.
  3. Document your work well. Make sure to use descriptive comments so future coders can decipher what you did and why.

Conclusion

There are multiple ways to find the size of a DataFrame in Python, depending on your preferences and code requirements. Whether you need simplicity or detailed insights, there's an approach that suits your specific needs.

Always consider the nature of your data and the insights you aim to derive when determining which approach to use. To learn more ways to use Python DataFrames, check out DataCamp’s Introduction to Python course or the Intermediate Python for Finance course. Or try out DataCamp’s data scientist in Python career track.

You can also check out polars, a newer competitor to pandas for high-performance DataFrame analysis. You can read more about the difference between pandas and polars or discover an introduction to using polars.


Photo of Amberle McKee
Author
Amberle McKee

I am a PhD with 13 years of experience working with data in a biological research environment. I create software in several programming languages including Python, MATLAB, and R. I am passionate about sharing my love of learning with the world.

Topics

Keep Learning Python! 

Track

Python Fundamentals

15hrs hr
Grow your programmer skills. Discover how to manipulate dictionaries and DataFrames, visualize real-world data, and write your own Python functions.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat sheet

LaTeX Cheat Sheet

Learn everything you need to know about LaTeX in this convenient cheat sheet!
Richie Cotton's photo

Richie Cotton

tutorial

How to Convert a List to a String in Python

Learn how to convert a list to a string in Python in this quick tutorial.
Adel Nehme's photo

Adel Nehme

tutorial

A Comprehensive Tutorial on Optical Character Recognition (OCR) in Python With Pytesseract

Master the fundamentals of optical character recognition in OCR with PyTesseract and OpenCV.
Bex Tuychiev's photo

Bex Tuychiev

11 min

tutorial

Encapsulation in Python Object-Oriented Programming: A Comprehensive Guide

Learn the fundamentals of implementing encapsulation in Python object-oriented programming.
Bex Tuychiev's photo

Bex Tuychiev

11 min

tutorial

Python KeyError Exceptions and How to Fix Them

Learn key techniques such as exception handling and error prevention to handle the KeyError exception in Python effectively.
Javier Canales Luna's photo

Javier Canales Luna

6 min

code-along

Full Stack Data Engineering with Python

In this session, you'll see a full data workflow using some LIGO gravitational wave data (no physics knowledge required). You'll see how to work with HDF5 files, clean and analyze time series data, and visualize the results.
Blenda Guedes's photo

Blenda Guedes

See MoreSee More