Tutorials
python

Python Select Columns

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc.

Selecting Columns Using Square Brackets

Now suppose that you want to select the country column from the brics DataFrame. To achieve this, you will type brics and then the column label inside the square brackets.

Selecting a Column

          country     capital        area    population
BR         Brazil    Brasilia       8.516        200.40
RU         Russia      Moscow      17.100        143.50
IN         India    New Dehli       3.286       1252.00
CH         China      Beijing       9.597       1357.00
SA  South Africa     Pretoria       1.221         52.98
brics["country"]
BR         Brazil
RU         Russia
IN         India
CH         China
SA  South Africa
Name: country, dtype: object

Checking the Type of the Object

Let's check the type of the object that gets returned with the type function.

type(brics["country"])
pandas.core.series.Series

As we can see from the above output, we are dealing with a pandas series here! Series could be thought of as a one-dimensional array that could be labeled just like a DataFrame.

If you want to select data and keep it in a DataFrame, you will need to use double square brackets:

brics[["country"]]
BR         Brazil
RU         Russia
IN         India
CH         China
SA  South Africa

If we check the type of this output, it's a DataFrame! With only one column, though.

type(brics[["country"]])
pandas.core.frame.DataFrame

Selecting Multiple Columns

You can extend this call to select two columns. Let's try to select country and capital.

brics[["country", "capital"]]
          country     capital
BR         Brazil    Brasilia
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing
SA  South Africa     Pretoria

If you look at this closely, you are actually putting a list with column labels inside another set of square brackets and end up with a sub DataFrame containing only the country and capital columns.

Selecting Rows Using Square Brackets

Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame.

Example

You can only select rows using square brackets if you specify a slice, like 0:4. Also, you're using the integer indexes of the rows here, not the row labels!

To get the second, third, and fourth rows of brics DataFrame, we use the slice 1 through 4. Remember that end the of the slice is exclusive, and the index starts at zero.

brics[1:4]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50
IN         India    New Dehli    3.286       1252.00
CH         China      Beijing    9.597       1357.00

These square brackets work, but they only offer limited functionality. Ideally, we would want something similar to 2D Numpy arrays, where you also use square brackets. The index, or slice, before the comma refers to the rows, and the slice after the comma refers to the columns.

Example of 2D Numpy array:

my_array[rows, columns]

If you want to do something similar with pandas, you need to look at using the loc and iloc functions.

  • loc: label-based
  • iloc: integer position-based

loc Function

loc is a technique to select parts of your data based on labels. Let's look at the brics DataFrame and get the rows for Russia.

To achieve this, you will put the label of interest in square brackets after loc.

Selecting Rows

brics.loc["RU"]
country     Russia
capital     Moscow
area          17.1
population   143.5
Name: RU, dtype: object

We get a pandas series containing all of the rows information; inconveniently, though, it is shown on different lines. To get a DataFrame, we have to put the RU sting in another pair of brackets. We can also select multiple rows at the same time. Suppose you want to also include India and China. Simply add those row labels to the list.

brics.loc[["RU", "IN", "CH"]]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50
IN         India    New Dehli    3.286       1252.00
CH         China      Beijing    9.597       1357.00

The difference between using a loc and basic square brackets is that you can extend the selection with a comma and a specification of the columns of interest.

Selecting Rows and Columns

Let's extend the previous call to only include the country and capital columns. We add a comma and list the column labels we want to keep. The intersection gets returned.

brics.loc[["RU", "IN", "CH"], ["country", "capital"]]
          country     capital
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing

You can also use loc to select all rows but only a specific number of columns. Simply replace the first list that specifies the row labels with a colon. A slice going from beginning to end. This time, we get back all of the rows but only two columns.

Selecting All Rows and Specific Columns

brics.loc[:, ["country", "capital"]]
          country     capital
BR         Brazil    Brasilia
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing
SA  South Africa     Pretoria

iloc Function

The iloc function allows you to subset pandas DataFrames based on their position or index.

Selecting Rows

Let's use the same data and similar examples as we did for loc. Let's start by getting the row for Russia.

brics.iloc[[1]]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50

To get the rows for Russia, India, and China. You can now use a list of index 1, 2, 3.

brics.iloc[[1, 2, 3]]
          country     capital     area    population
RU         Russia      Moscow   17.100        143.50
IN         India    New Dehli    3.286       1252.00
CH         China      Beijing    9.597       1357.00

Selecting Rows and Columns

Similar to loc, we can also select both rows and columns using iloc. Here, we will select rows for Russia, India, and China and columns country and capital.

brics.iloc[[1, 2, 3], [0, 1]]
          country     capital
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing

Selecting All Rows and Specific Columns

Finally, if you wanted to select all rows but just keep the country and capital columns, you can:

brics.loc[:, [0, 1]]
          country     capital
BR         Brazil    Brasilia
RU         Russia      Moscow
IN         India    New Dehli
CH         China      Beijing
SA  South Africa     Pretoria

loc and iloc functions are pretty similar. The only difference is how you refer to columns and rows.

Interactive Example on Selecting a Subset of Data

In the following example, the cars data is imported from a CSV files as a Pandas DataFrame. To select only the cars_per_cap column from cars, you can use:

cars['cars_per_cap']
cars[['cars_per_cap']]

The single bracket version gives a Pandas Series; the double bracket version gives a Pandas DataFrame.

  • You will use single square brackets to print out the country column of cars as a Pandas Series.
  • Then use double square brackets to print out the country column of cars as a Pandas DataFrame.
  • Finally, use the double square brackets to print out a DataFrame with both the country and drives_right columns of cars, in this order.
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Print out country column as Pandas Series
print(cars['country'])

# Print out country column as Pandas DataFrame
print(cars[['country']])

# Print out DataFrame with country and drives_right columns
print(cars[['country', 'drives_right']])

When we run the above code, it produces the following result:

US     United States
AUS        Australia
JPN            Japan
IN             India
RU            Russia
MOR          Morocco
EG             Egypt
Name: country, dtype: object
           country
US   United States
AUS      Australia
JPN          Japan
IN           India
RU          Russia
MOR        Morocco
EG           Egypt
           country  drives_right
US   United States          True
AUS      Australia         False
JPN          Japan         False
IN           India         False
RU          Russia          True
MOR        Morocco          True
EG           Egypt          True

Try it for yourself.

To learn more about pandas, please see this video from our course Intermediate Python.

This content is taken from DataCamp’s Intermediate Python course by Hugo Bowne-Anderson.