Skip to main content

Pandas Apply Tutorial

Sep 2020  · 3 min read

One alternative to using a loop to iterate over a DataFrame is to use the pandas .apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.

If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows).

Much like the map() function, the apply() method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Calculating Run Differentials With .apply()

First, you will call the .apply() method on the basebal_df dataframe. Then use the lambda function to iterate over the rows of the dataframe. For every row, we grab the RS and RA columns and pass them to the calc_run_diff function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.

baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis=1
)

You will notice that we don't need to use a for loop. You can collect the run differentials directly into an object called run_diffs_apply. After creating a new column and printing the dataframe you will notice that the results are similar to what you will get with the .iterrows() method.

run_diffs_apply = baseball_df.apply(
         lambda row: calc_run_diff(row['RS'], row['RA']),
         axis=1)
baseball_df['RD'] = run_diffs_apply
print(baseball_df)
      Team    League    year   RS    RA    W    G   Playoffs    RD
0      ARI        NL    2012  734   688   81  162          0    46
1      ATL        NL    2012  700   600   94  162          1   100
2      BAL        AL    2012  712   705   93  162          1     7 

Interactive Example Using .apply()

The Tampa Bay Rays want you to analyze their data.

They'd like the following metrics:

  • The sum of each column in the data
  • The total amount of runs scored in a year ('RS' + 'RA' for each year)
  • The 'Playoffs' column in text format rather than using 1's and 0's

The below function can be used to convert the 'Playoffs' column to text:

def text_playoffs(num_playoffs):
    if num_playoffs == 1:
        return 'Yes'
    else:
      return 'No'

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed below. This DataFrame is indexed on the 'Year' column.

       RS   RA   W  Playoffs
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1 
  • Apply sum() to each column of the rays_df to collect the sum of each column. Be sure to specify the correct axis.
# Gather sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)

When we run the above code, it produces the following result:

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

Try it for yourself.

To learn more about pandas alternative to looping, please see this video from our course Writing Efficient Python Code.

This content is taken from DataCamp’s Writing Efficient Python Code course by Logan Thomas.

Introduction to Python

Beginner
4 hours
4,596,566
Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.
See DetailsRight Arrow
Start Course

Intermediate Python

Beginner
4 hours
883,266
Level up your data science skills by creating visualizations using Matplotlib and manipulating DataFrames with pandas.

Python Data Science Toolbox (Part 1)

Beginner
3 hours
344,436
Learn the art of writing your own functions in Python, as well as key concepts like scoping and error handling.
See all coursesRight Arrow
Related

The 23 Top Python Interview Questions & Answers

Essential Python interview questions with examples for job seekers, final-year students, and data professionals.
Abid Ali Awan's photo

Abid Ali Awan

22 min

Working with Dates and Times in Python Cheat Sheet

Working with dates and times is essential when manipulating data in Python. Learn the basics of working with datetime data in this cheat sheet.
DataCamp Team's photo

DataCamp Team

Plotly Express Cheat Sheet

Plotly is one of the most widely used data visualization packages in Python. Learn more about it in this cheat sheet.
DataCamp Team's photo

DataCamp Team

0 min

Getting started with Python cheat sheet

Python is the most popular programming language in data science. Use this cheat sheet to jumpstart your Python learning journey.
DataCamp Team's photo

DataCamp Team

8 min

Python pandas tutorial: The ultimate guide for beginners

Are you ready to begin your pandas journey? Here’s a step-by-step guide on how to get started. [Updated November 2022]
Vidhi Chugh's photo

Vidhi Chugh

15 min

Python Iterators and Generators Tutorial

Explore the difference between Python Iterators and Generators and learn which are the best to use in various situations.
Kurtis Pykes 's photo

Kurtis Pykes

10 min

See MoreSee More