Skip to main content

Course

Writing Efficient Code with pandas

IntermediateSkill Level

4.8+

Updated 08/2022

Learn efficient techniques in pandas to optimize your Python code.

Start Course for Free

PythonProgramming

4 hr

14 videos

45 Exercises

3,500 XP

21,690

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

The ability to efficiently work with big datasets and extract valuable information is an indispensable tool for every aspiring data scientist. When working with a small amount of data, we often don’t realize how slow code execution can be. This course will build on your knowledge of Python and the pandas library and introduce you to efficient built-in pandas functions to perform tasks faster. Pandas’ built-in functions allow you to tackle the simplest tasks, like targeting specific entries and features from the data, to the most complex tasks, like applying functions on groups of entries, much faster than Python's usual methods. By the end of this course, you will be able to apply a function to data based on a feature value, iterate through big datasets rapidly, and manipulate data belonging to different groups efficiently. You will apply these methods on a variety of real-world datasets, such as poker hands or restaurant tips.

Prerequisites

Data Manipulation with pandas

1

Selecting columns and rows efficiently

This chapter will give you an overview of why efficient code matters and selecting specific and random rows and columns efficiently.

The need for efficient coding I

What does time.time() measure?

Measuring time I

Measuring time II

Locate rows: .iloc[] and .loc[]

Row selection: loc[] vs iloc[]

Column selection: .iloc[] vs by name

Select random rows

Random row selection

Random column selection

2

Replacing values in a DataFrame

This chapter shows the usage of the replace() function for replacing one or multiple values using lists and dictionaries.

Replace scalar values using .replace()

Replacing scalar values I

Replace scalar values II

Replace values using lists

Replace multiple values I

Replace multiple values II

Replace values using dictionaries

Replace single values I

Replace single values II

Replace multiple values III

Most efficient method for scalar replacement

3

Efficient iterating

This chapter presents different ways of iterating through a Pandas DataFrame and why vectorization is the most efficient way to achieve it.

Looping using the .iterrows() function

Create a generator for a pandas DataFrame

The iterrows() function for looping

Looping using the .apply() function

.apply() function in every cell

.apply() for rows iteration

Vectorization over pandas series

Why vectorization in pandas is so fast?

pandas vectorization in action

Vectorization with NumPy arrays using .values()

Best method of vectorization

Vectorization methods for looping a DataFrame

4

Data manipulation using .groupby()

This chapter describes the groupby() function and how we can use it to transform values in place, replace missing values and apply complex functions group-wise.

Data transformation using .groupby().transform

The min-max normalization using .transform()

Transforming values to probabilities

Validation of normalization

When to use transform()?

Missing value imputation using transform()

Identifying missing values

Missing value imputation

Data filtration using the filter() function

When to use filtration?

Data filtration

Congratulations!

Writing Efficient Code with pandas

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.8

from 151 reviews

83%

15%

1%

1%

0%

Sort by

Berry

8 hours ago

Primo

2 days ago

Toan

last week

Deepaly

2 weeks ago

good

Krasimir

5 weeks ago

Tung

5 weeks ago

.

Berry

Primo

Toan

FAQs

What does Writing Efficient Code with pandas help me do faster?

You will learn to select data, replace values, iterate through large datasets, and apply group-wise operations much faster using pandas built-in functions instead of standard Python loops.

Why is vectorization important and is it covered here?

Vectorization applies operations to entire columns at once instead of looping row by row, dramatically speeding up your code. Chapter 3 covers it in detail as the most efficient iteration method.

What real-world datasets are used in the exercises?

You will work with datasets including poker hands and restaurant tips to practice efficient data manipulation techniques in pandas.

Do I need experience with pandas before taking this course?

Yes. You should have completed Data Manipulation with pandas and Intermediate Python. The course builds on existing pandas knowledge to teach performance optimization.

What will I learn about the groupby function?

You will learn to use groupby to transform values in place, replace missing values within groups, and apply complex functions to grouped data efficiently.

Join over 19 million learners and start Writing Efficient Code with pandas today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.