Skip to main content
HomeAbout PythonLearn Python

Pandas Apply Tutorial

Sep 2020  · 3 min read

One alternative to using a loop to iterate over a DataFrame is to use the pandas .apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.

If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows).

Much like the map() function, the apply() method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Calculating Run Differentials With .apply()

First, you will call the .apply() method on the basebal_df dataframe. Then use the lambda function to iterate over the rows of the dataframe. For every row, we grab the RS and RA columns and pass them to the calc_run_diff function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.

baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis=1
)

You will notice that we don't need to use a for loop. You can collect the run differentials directly into an object called run_diffs_apply. After creating a new column and printing the dataframe you will notice that the results are similar to what you will get with the .iterrows() method.

run_diffs_apply = baseball_df.apply(
         lambda row: calc_run_diff(row['RS'], row['RA']),
         axis=1)
baseball_df['RD'] = run_diffs_apply
print(baseball_df)
      Team    League    year   RS    RA    W    G   Playoffs    RD
0      ARI        NL    2012  734   688   81  162          0    46
1      ATL        NL    2012  700   600   94  162          1   100
2      BAL        AL    2012  712   705   93  162          1     7 

Interactive Example Using .apply()

The Tampa Bay Rays want you to analyze their data.

They'd like the following metrics:

  • The sum of each column in the data
  • The total amount of runs scored in a year ('RS' + 'RA' for each year)
  • The 'Playoffs' column in text format rather than using 1's and 0's

The below function can be used to convert the 'Playoffs' column to text:

def text_playoffs(num_playoffs):
    if num_playoffs == 1:
        return 'Yes'
    else:
      return 'No'

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed below. This DataFrame is indexed on the 'Year' column.

       RS   RA   W  Playoffs
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1 
  • Apply sum() to each column of the rays_df to collect the sum of each column. Be sure to specify the correct axis.
# Gather sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)

When we run the above code, it produces the following result:

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

Try it for yourself.

To learn more about pandas alternative to looping, please see this video from our course Writing Efficient Python Code.

This content is taken from DataCamp’s Intermediate Python course by Logan Thomas.

Learn more about Python and pandas

Data Manipulation with pandas

Beginner
4 hr
273.1K
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Pandas 2.0: What’s New and Top Tips

Dive into pandas 2.0, the latest update of the essential data analysis library, with new features like PyArrow integration, nullable data types, and non-nanosecond datetime resolution for better performance and efficiency.
Moez Ali's photo

Moez Ali

9 min

PyTorch 2.0 is Here: Everything We Know

Explore the latest release of PyTorch, which is faster, more Pythonic, and more dynamic.
Abid Ali Awan's photo

Abid Ali Awan

6 min

An Introduction to Python T-Tests

Learn how to perform t-tests in Python with this tutorial. Understand the different types of t-tests - one-sample test, two-sample test, paired t-test, and Welch’s test, and when to use them.
Vidhi Chugh's photo

Vidhi Chugh

13 min

Matplotlib time series line plot

This tutorial explores how to create and customize time series line plots in matplotlib.
Elena Kosourova's photo

Elena Kosourova

8 min

Step-by-Step Guide to Making Map in Python using Plotly Library

Make your data stand out with stunning maps created with Plotly in Python
Moez Ali's photo

Moez Ali

7 min

High Performance Data Manipulation in Python: pandas 2.0 vs. polars

Discover the main differences between Python’s pandas and polars libraries for data science
Javier Canales Luna's photo

Javier Canales Luna

16 min

See MoreSee More