Tutorials

One alternative to using a loop to iterate over a DataFrame is to use the pandas .apply() method. This function acts as a map() function in Python. It takes a function as an input and applies this function to an entire DataFrame.

If you are working with tabular data, you must specify an axis you want your function to act on (0 for columns; and 1 for rows).

Much like the map() function, the apply() method can also be used with anonymous functions or lambda functions. Let's look at some apply() examples using baseball data.

Calculating Run Differentials With .apply()

First, you will call the .apply() method on the basebal_df dataframe. Then use the lambda function to iterate over the rows of the dataframe. For every row, we grab the RS and RA columns and pass them to the calc_run_diff function. Finally, you will specify the axis=1 to tell the .apply() method that we want to apply it on the rows instead of columns.

baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis=1
)

You will notice that we don't need to use a for loop. You can collect the run differentials directly into an object called run_diffs_apply. After creating a new column and printing the dataframe you will notice that the results are similar to what you will get with the .iterrows() method.

run_diffs_apply = baseball_df.apply(
         lambda row: calc_run_diff(row['RS'], row['RA']),
         axis=1)
baseball_df['RD'] = run_diffs_apply
print(baseball_df)
      Team    League    year   RS    RA    W    G   Playoffs    RD
0      ARI        NL    2012  734   688   81  162          0    46
1      ATL        NL    2012  700   600   94  162          1   100
2      BAL        AL    2012  712   705   93  162          1     7

Interactive Example Using .apply()

The Tampa Bay Rays want you to analyze their data.

They'd like the following metrics:

  • The sum of each column in the data
  • The total amount of runs scored in a year ('RS' + 'RA' for each year)
  • The 'Playoffs' column in text format rather than using 1's and 0's

The below function can be used to convert the 'Playoffs' column to text:

def text_playoffs(num_playoffs):
    if num_playoffs == 1:
        return 'Yes'
    else:
      return 'No'

Use .apply() to get these metrics. A DataFrame (rays_df) has been printed below. This DataFrame is indexed on the 'Year' column.

       RS   RA   W  Playoffs
2012  697  577  90         0
2011  707  614  91         1
2010  802  649  96         1
2009  803  754  84         0
2008  774  671  97         1
  • Apply sum() to each column of the rays_df to collect the sum of each column. Be sure to specify the correct axis.
# Gather sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)

When we run the above code, it produces the following result:

RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

Try it for yourself.

To learn more about pandas alternative to looping, please see this video from our course Writing Efficient Python Code.

This content is taken from DataCamp’s Writing Efficient Python Code course by Logan Thomas.