Skip to main content
HomeAbout PythonLearn Python

Python Select Columns Tutorial

Use Python Pandas and select columns from DataFrames. Follow our tutorial with code examples and learn different ways to select your data today!
Sep 2020  · 7 min read

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc.

Selecting Columns Using Square Brackets

Now suppose that you want to select the country column from the brics DataFrame. To achieve this, you will type brics and then the column label inside the square brackets.

Selecting a Column

          country     capital        area    population
BR         Brazil    Brasilia       8.516        200.40
RU         Russia      Moscow      17.100        143.50
IN         India    New Dehli       3.286       1252.00
CH         China      Beijing       9.597       1357.00
SA  South Africa     Pretoria       1.221         52.98
brics["country"]
BR         Brazil
RU         Russia
IN         India
CH         China
SA  South Africa
Name: country, dtype: object

Checking the Type of the Object

Let's check the type of the object that gets returned with the type function.

type(brics["country"])
pandas.core.series.Series

As we can see from the above output, we are dealing with a pandas series here! Series could be thought of as a one-dimensional array that could be labeled just like a DataFrame.

If you want to select data and keep it in a DataFrame, you will need to use double square brackets:

brics[["country"]]
BR         Brazil
RU         Russia
IN         India
CH         China
SA  South Africa

If we check the type of this output, it's a DataFrame! With only one column, though.

type(brics[["country"]])
pandas.core.frame.DataFrame

Selecting Multiple Columns

You can extend this call to select two columns. Let's try to select country and capital.

brics[["country", "capital"]]
          country     capital
BR         Brazil    Brasilia
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing
SA  South Africa     Pretoria

If you look at this closely, you are actually putting a list with column labels inside another set of square brackets and end up with a sub DataFrame containing only the country and capital columns.

Selecting Rows Using Square Brackets

Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame.

Example

You can only select rows using square brackets if you specify a slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels!

To get the second, third, and fourth rows of brics DataFrame, we use the slice 1 through 4. Remember that end the of the slice is exclusive, and the index starts at zero.

brics[1:4]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50
IN         India    New Dehli    3.286       1252.00
CH         China      Beijing    9.597       1357.00

These square brackets work, but they only offer limited functionality. Ideally, we would want something similar to 2D Numpy arrays, where you also use square brackets. The index, or slice, before the comma refers to the rows, and the slice after the comma refers to the columns.

Example of 2D Numpy array:

my_array[rows, columns]

If you want to do something similar with pandas, you need to look at using the loc and iloc functions.

  • loc: label-based
  • iloc: integer position-based

loc Function

loc is a technique to select parts of your data based on labels. Let's look at the brics DataFrame and get the rows for Russia.

To achieve this, you will put the label of interest in square brackets after loc.

Selecting Rows

brics.loc["RU"]
country     Russia
capital     Moscow
area          17.1
population   143.5
Name: RU, dtype: object

We get a pandas series containing all of the rows information; inconveniently, though, it is shown on different lines. To get a DataFrame, we have to put the RU sting in another pair of brackets. We can also select multiple rows at the same time. Suppose you want to also include India and China. Simply add those row labels to the list.

brics.loc[["RU", "IN", "CH"]]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50
IN         India    New Dehli    3.286       1252.00
CH         China      Beijing    9.597       1357.00

The difference between using a loc and basic square brackets is that you can extend the selection with a comma and a specification of the columns of interest.

Selecting Rows and Columns

Let's extend the previous call to only include the country and capital columns. We add a comma and list the column labels we want to keep. The intersection gets returned.

brics.loc[["RU", "IN", "CH"], ["country", "capital"]]
          country     capital
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing

You can also use loc to select all rows but only a specific number of columns. Simply replace the first list that specifies the row labels with a colon. A slice going from beginning to end. This time, we get back all of the rows but only two columns.

Selecting All Rows and Specific Columns

brics.loc[:, ["country", "capital"]]
          country     capital
BR         Brazil    Brasilia
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing
SA  South Africa     Pretoria

iloc Function

The iloc function allows you to subset pandas DataFrames based on their position or index.

Selecting Rows

Let's use the same data and similar examples as we did for loc. Let's start by getting the row for Russia.

brics.iloc[[1]]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50

To get the rows for Russia, India, and China. You can now use a list of index 1, 2, 3.

brics.iloc[[1, 2, 3]]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50
IN         India    New Dehli    3.286       1252.00
CH         China      Beijing    9.597       1357.00

Selecting Rows and Columns

Similar to loc, we can also select both rows and columns using iloc. Here, we will select rows for Russia, India, and China and columns country and capital.

brics.iloc[[1, 2, 3], [0, 1]]
          country     capital
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing

Selecting All Rows and Specific Columns

Finally, if you wanted to select all rows but just keep the country and capital columns, you can:

brics.loc[:, [0, 1]]
          country     capital
BR         Brazil    Brasilia
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing
SA  South Africa     Pretoria

loc and iloc functions are pretty similar. The only difference is how you refer to columns and rows.

Interactive Example on Selecting a Subset of Data

In the following example, the cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:

cars['cars_per_cap']
cars[['cars_per_cap']]

The single bracket version gives a Pandas Series; the double bracket version gives a Pandas DataFrame.

  • You will use single square brackets to print out the country column of cars as a Pandas Series.
  • Then use double square brackets to print out the country column of cars as a Pandas DataFrame.
  • Finally, use the double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order.
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars['country'])

# Print out country column as Pandas DataFrame
print(cars[['country']])

# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])

When we run the above code, it produces the following result:

US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
           country
US   United States
AUS      Australia
JPN          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt
           country  drives_right
US   United States          True
AUS      Australia         False
JPN          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

Try it for yourself.

To learn more about pandas, please see this video from our course Intermediate Python.

This content is taken from DataCamp’s Intermediate Python course by Hugo Bowne-Anderson.

Topics

Python Courses

Certification available

Course

Intermediate Python

4 hr
1.1M
Level up your data science skills by creating visualizations using Matplotlib and manipulating DataFrames with pandas.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Finding the Size of a DataFrame in Python

There are several ways to find the size of a DataFrame in Python to fit different coding needs. Check out this tutorial for a quick primer on finding the size of a DataFrame. This tutorial presents several ways to check DataFrame size, so you’re sure to find a way that fits your needs.
Amberle McKee's photo

Amberle McKee

5 min

Mastering the Pandas .explode() Method: A Comprehensive Guide

Learn all you need to know about the pandas .explode() method, covering single and multiple columns, handling nested data, and common pitfalls with practical Python code examples.
Adel Nehme's photo

Adel Nehme

5 min

Python NaN: 4 Ways to Check for Missing Values in Python

Explore 4 ways to detect NaN values in Python, using NumPy and Pandas. Learn key differences between NaN and None to clean and analyze data efficiently.
Adel Nehme's photo

Adel Nehme

5 min

Seaborn Heatmaps: A Guide to Data Visualization

Learn how to create eye-catching Seaborn heatmaps
Joleen Bothma's photo

Joleen Bothma

9 min

Test-Driven Development in Python: A Beginner's Guide

Dive into test-driven development (TDD) with our comprehensive Python tutorial. Learn how to write robust tests before coding with practical examples.
Amina Edmunds's photo

Amina Edmunds

7 min

Exponents in Python: A Comprehensive Guide for Beginners

Master exponents in Python using various methods, from built-in functions to powerful libraries like NumPy, and leverage them in real-world scenarios to gain a deeper understanding.
Satyam Tripathi's photo

Satyam Tripathi

9 min

See MoreSee More