Skip to main content
HomeAbout PythonLearn Python

Pandas Apply Tutorial

Learn what Python pandas .apply is and how to use it for DataFrames. Learn how to iterate over DataFrames using the .apply() function today!
Sep 2020  · 3 min read

One alternative to using a loop to iterate over a DataFrame is to use the pandas .apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.

If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows).

Much like the map() function, the apply() method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Calculating Run Differentials With .apply()

First, you will call the .apply() method on the basebal_df dataframe. Then use the lambda function to iterate over the rows of the dataframe. For every row, we grab the RS and RA columns and pass them to the calc_run_diff function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.

baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis=1
)

You will notice that we don't need to use a for loop. You can collect the run differentials directly into an object called run_diffs_apply. After creating a new column and printing the dataframe you will notice that the results are similar to what you will get with the .iterrows() method.

run_diffs_apply = baseball_df.apply(
         lambda row: calc_run_diff(row['RS'], row['RA']),
         axis=1)
baseball_df['RD'] = run_diffs_apply
print(baseball_df)
      Team    League    year   RS    RA    W    G   Playoffs    RD
0      ARI        NL    2012  734   688   81  162          0    46
1      ATL        NL    2012  700   600   94  162          1   100
2      BAL        AL    2012  712   705   93  162          1     7 

Interactive Example Using .apply()

The Tampa Bay Rays want you to analyze their data.

They'd like the following metrics:

  • The sum of each column in the data
  • The total amount of runs scored in a year ('RS' + 'RA' for each year)
  • The 'Playoffs' column in text format rather than using 1's and 0's

The below function can be used to convert the 'Playoffs' column to text:

def text_playoffs(num_playoffs):
    if num_playoffs == 1:
        return 'Yes'
    else:
      return 'No'

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed below. This DataFrame is indexed on the 'Year' column.

       RS   RA   W  Playoffs
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1 
  • Apply sum() to each column of the rays_df to collect the sum of each column. Be sure to specify the correct axis.
# Gather sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)

When we run the above code, it produces the following result:

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

Try it for yourself.

To learn more about pandas alternative to looping, please see this video from our course Writing Efficient Python Code.

This content is taken from DataCamp’s Intermediate Python course by Logan Thomas.

Topics

Learn more about Python and pandas

Certification available

Course

Data Manipulation with pandas

4 hr
350.2K
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Mastering the Pandas .explode() Method: A Comprehensive Guide

Learn all you need to know about the pandas .explode() method, covering single and multiple columns, handling nested data, and common pitfalls with practical Python code examples.
Adel Nehme's photo

Adel Nehme

5 min

Python NaN: 4 Ways to Check for Missing Values in Python

Explore 4 ways to detect NaN values in Python, using NumPy and Pandas. Learn key differences between NaN and None to clean and analyze data efficiently.
Adel Nehme's photo

Adel Nehme

5 min

Seaborn Heatmaps: A Guide to Data Visualization

Learn how to create eye-catching Seaborn heatmaps
Joleen Bothma's photo

Joleen Bothma

9 min

Test-Driven Development in Python: A Beginner's Guide

Dive into test-driven development (TDD) with our comprehensive Python tutorial. Learn how to write robust tests before coding with practical examples.
Amina Edmunds's photo

Amina Edmunds

7 min

Exponents in Python: A Comprehensive Guide for Beginners

Master exponents in Python using various methods, from built-in functions to powerful libraries like NumPy, and leverage them in real-world scenarios to gain a deeper understanding.
Satyam Tripathi's photo

Satyam Tripathi

9 min

Python Linked Lists: Tutorial With Examples

Learn everything you need to know about linked lists: when to use them, their types, and implementation in Python.
Natassha Selvaraj's photo

Natassha Selvaraj

9 min

See MoreSee More