Skip to main content
HomeAbout PythonLearn Python

How to Drop Columns in Pandas Tutorial

Learn how to drop columns in a pandas DataFrame.
Aug 2020  · 3 min read

Often a DataFrame will contain columns that are not useful to your analysis. Such columns should be dropped from the DataFrame to make it easier for you to focus on the remaining columns.

The columns can be removed by specifying label names and corresponding axis, or by specifying index or column names directly. When using a multi-index, labels on different levels can be removed by specifying the level.

.drop() Method

Let's compare missing value counts with the shape of the dataframe. You will notice that the county_name column contains as many missing values as rows, meaning that it only contains missing values.

ri.isnull().sum()
state                            0
stop_date                        0
stop_time                        0
county_name                  91741
driver_gender                 5205
driver_race                   5202
...
ri.shape
91741, 15

Since it contains no useful information, this column can be dropped using the .drop() method.

Besides specifying the column name, you need to specify that you are dropping from the columns axis and that you want the operation to occur in place, which avoids an assignment statement as shown below:

ri.drop('county_name',
  axis='columns', inplace=True)

.dropna() Method

The .dropna() method is a great way to drop rows based on the presence of missing values in that row.

For example, using the dataset above, let's assume the stop_date and stop_time columns are critical to our analysis, and thus a row is useless to us without that data.

ri.head()
    state   stop_date    stop_time    driver_gender   driver_race
0      RI  2005-01-04        12:55                M         White
1      RI  2005-01-23        23:15                M         White
2      RI  2005-02-17        04:15                M         White
3      RI  2005-02-20        17:15                M         White
4      RI  2005-02-24        01:20                F         White

We can tell pandas to drop all rows that have a missing value in either the stop_date or stop_time column. Because we specify a subset, the .dropna() method only takes these two columns into account when deciding which rows to drop.

ri.dropna(subset=['stop_date', 'stop_time'], inplace=True)

Interactive Example of Dropping Columns

In this example, you will drop the county_name column because it only contains missing values, and you'll drop the state column because all of the traffic stops took place in one state (Rhode Island). Thus, these columns can be dropped because they contain no useful information. The number of missing values in each column has been printed to the console for you.

  • Examine the DataFrame's .shape to find out the number of rows and columns.
  • Drop both the county_name and state columns by passing the column names to the .drop() method as a list of strings.
  • Examine the .shape again to verify that there are now two fewer columns.
# Examine the shape of the DataFrame
print(ri.shape)

# Drop the 'county_name' and 'state' columns
ri.drop(['county_name', 'state'], axis='columns', inplace=True)

# Examine the shape of the DataFrame (again)
print(ri.shape)

When you run the above code, it produces the following result:

(91741, 15)
(91741, 13)

Try it for yourself.

To learn more about dropping columns in pandas, please see this video from our course, Introduction to Data Visualization with ggplot2.

This content is taken from DataCamp’s Introduction to Data Visualization with ggplot2 course by Kevin Markham.

Check out our Pandas Add Column Tutorial.

Topics

Pandas Courses

Certification available

Course

Introduction to Python

4 hr
5.4M
Master the basics of data analysis with Python in just four hours. This online course will introduce the Python interface and explore popular packages.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Mastering the Pandas .explode() Method: A Comprehensive Guide

Learn all you need to know about the pandas .explode() method, covering single and multiple columns, handling nested data, and common pitfalls with practical Python code examples.
Adel Nehme's photo

Adel Nehme

5 min

Python NaN: 4 Ways to Check for Missing Values in Python

Explore 4 ways to detect NaN values in Python, using NumPy and Pandas. Learn key differences between NaN and None to clean and analyze data efficiently.
Adel Nehme's photo

Adel Nehme

5 min

Seaborn Heatmaps: A Guide to Data Visualization

Learn how to create eye-catching Seaborn heatmaps
Joleen Bothma's photo

Joleen Bothma

9 min

Test-Driven Development in Python: A Beginner's Guide

Dive into test-driven development (TDD) with our comprehensive Python tutorial. Learn how to write robust tests before coding with practical examples.
Amina Edmunds's photo

Amina Edmunds

7 min

Exponents in Python: A Comprehensive Guide for Beginners

Master exponents in Python using various methods, from built-in functions to powerful libraries like NumPy, and leverage them in real-world scenarios to gain a deeper understanding.
Satyam Tripathi's photo

Satyam Tripathi

9 min

Python Linked Lists: Tutorial With Examples

Learn everything you need to know about linked lists: when to use them, their types, and implementation in Python.
Natassha Selvaraj's photo

Natassha Selvaraj

9 min

See MoreSee More