Skip to main content

Pandas Apply Tutorial

Learn what Python pandas .apply is and how to use it for DataFrames. Learn how to iterate over DataFrames using the .apply() function today!
Sep 17, 2020  · 3 min read

One alternative to using a loop to iterate over a DataFrame is to use the pandas .apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.

If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows).

Much like the map() function, the apply() method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Calculating Run Differentials With .apply()

First, you will call the .apply() method on the basebal_df dataframe. Then use the lambda function to iterate over the rows of the dataframe. For every row, we grab the RS and RA columns and pass them to the calc_run_diff function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.

baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis=1
)

You will notice that we don't need to use a for loop. You can collect the run differentials directly into an object called run_diffs_apply. After creating a new column and printing the dataframe you will notice that the results are similar to what you will get with the .iterrows() method.

run_diffs_apply = baseball_df.apply(
         lambda row: calc_run_diff(row['RS'], row['RA']),
         axis=1)
baseball_df['RD'] = run_diffs_apply
print(baseball_df)
      Team    League    year   RS    RA    W    G   Playoffs    RD
0      ARI        NL    2012  734   688   81  162          0    46
1      ATL        NL    2012  700   600   94  162          1   100
2      BAL        AL    2012  712   705   93  162          1     7 

Interactive Example Using .apply()

The Tampa Bay Rays want you to analyze their data.

They'd like the following metrics:

  • The sum of each column in the data
  • The total amount of runs scored in a year ('RS' + 'RA' for each year)
  • The 'Playoffs' column in text format rather than using 1's and 0's

The below function can be used to convert the 'Playoffs' column to text:

def text_playoffs(num_playoffs):
    if num_playoffs == 1:
        return 'Yes'
    else:
      return 'No'

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed below. This DataFrame is indexed on the 'Year' column.

       RS   RA   W  Playoffs
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1 
  • Apply sum() to each column of the rays_df to collect the sum of each column. Be sure to specify the correct axis.
# Gather sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)

When we run the above code, it produces the following result:

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

Try it for yourself.

To learn more about pandas alternative to looping, please see this video from our course Writing Efficient Python Code.

This content is taken from DataCamp’s Intermediate Python course by Logan Thomas.

Topics

Learn more about Python and pandas

course

Data Manipulation with pandas

4 hr
369K
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.
Karlijn Willems's photo

Karlijn Willems

20 min

tutorial

For Loops in Python Tutorial

Learn how to implement For Loops in Python for iterating a sequence, or the rows and columns of a pandas dataframe.
Aditya Sharma's photo

Aditya Sharma

5 min

tutorial

Python Select Columns Tutorial

Use Python Pandas and select columns from DataFrames. Follow our tutorial with code examples and learn different ways to select your data today!
DataCamp Team's photo

DataCamp Team

7 min

tutorial

pandas read_csv() Tutorial: Importing Data

Importing data is the first step in any data science project. Learn why today's data scientists prefer the pandas read_csv() function to do this.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

tutorial

Pandas Sort Values: A Complete How-To

Use sort_values() to reorder rows by column values. Apply sort_index() to rearrange rows by the DataFrame’s index. Combine both methods to explore your data from different angles.
DataCamp Team's photo

DataCamp Team

4 min

tutorial

Groupby, split-apply-combine and pandas

In this tutorial, you'll learn how to use the pandas groupby operation, which draws from the well-known split-apply-combine strategy, on Netflix movie data.
Hugo Bowne-Anderson's photo

Hugo Bowne-Anderson

8 min

See MoreSee More