Skip to main content

Iloc vs Loc in Pandas: A Guide With Examples

.loc selects data using row and column names (labels), while .iloc uses numerical indices (positions). Learn how to use both with examples.
Nov 21, 2024  · 8 min read

One of those annoying things that we’re all trying to figure out when we learn Pandas is the distinction between .loc and .iloc.

Let’s put an end to this confusion and clarify the difference between these two methods. I’ll give plenty of examples, and I hope the distinction will be much clearer by the end of this blog.

What Are .loc and .iloc in Pandas?

Both .loc and .iloc are essential attributes of Pandas DataFrames, and both are used for selecting specific subsets of data. Their purpose is to access and enable manipulating a specific part of the DataFrame instead of the whole DataFrame. 

Feature

.loc

.iloc

Syntax

df.loc[row_indexer, column_indexer]

df.iloc[row_indexer, column_indexer]

Indexing Method

Label-based indexing

Position-based indexing

Used for Reference

Row and column labels (names)

Numerical indices of rows and columns (starting from 0)

As we can see from the table, the syntax looks very similar. The difference lies in how we use the row_indexer and column_indexer arguments. This is because the two methods offer different approaches to indexing the data: while .loc indexes based on label names, .iloc takes the numerical position index of rows and columns as arguments.

Let’s examine each of the two methods in detail, starting with .loc.

Using .loc: Selection by Labels

To illustrate the concepts, let's consider a hypothetical customer database represented by this DataFrame called df, with the Customer ID representing the row index:

Customer ID

Name

Country

Region

Age

C123

John Doe

United States

North America

67

C234

Petra Müller

Germany

Europe

51

C345

Ali Khan

Pakistan

Asia

19

C456

Maria Gonzalez

Mexico

North America

26

C567

David Lee

China

Asia

40

There are four primary ways to select rows with .loc. These include:

  • Selecting a single row
  • Selecting multiple rows
  • Selecting a slice of rows
  • Conditional row selection

Selecting a single row using .loc

To select a single row, we use the label of the row we want to retrieve as row_indexer. Accordingly, the syntax looks like this: df.loc['row_label']. Let’s use this to display all the information on our customer Ali Khan:

df.loc['C345']

C345

 

Name

Ali Khan

Country

Pakistan

Region

Asia

Age

19

Selecting multiple rows using .loc

If we want to select multiple rows that do not necessarily follow each other in order, we have to pass a list of their row labels as the row_indexer argument. This means we need to use not one but two pairs of square brackets: one for the regular .loc syntax and one for the label list.

The line df.loc[['row_label_1', 'row_label_2']] will return the two rows of the df DataFrame specified in the list. Let’s say we wanted to know not only the information on Ali Khan but as well on David Lee:

df.loc[['C345', 'C567']]

Customer ID

Name

Country

Region

Age

C345

Ali Khan

Pakistan

Asia

19

C567

David Lee

China

Asia

40

Selecting a slice of rows using .loc

We can select a range of rows by passing the first and last row labels with a colon in between: df.loc['row_label_start':'row_label_end']. We could display the first four rows of our DataFrame like this:

df.loc['C123' : 'C456']

Customer ID

Name

Country

Region

Signup Date

C123

John Doe

United States

North America

67

C234

Petra Müller

Germany

Europe

51

C345

Ali Khan

Pakistan

Asia

19

C456

Maria Gonzalez

Mexico

North America

26

There are two things to keep in mind here:

  1. The output includes the row specified in row_label_end. This is different in .iloc, which we’ll cover later.
  2. We only use one pair of square brackets, even though we want to retrieve multiple rows. We do not use a list to specify the various rows, so using two square brackets would return a SyntaxError.

Conditional selection of rows using .loc

We can also return rows based on a conditional expression. We can filter all rows by whether or not they fulfill a certain condition and only display the ones that do.

The corresponding syntax is df.loc[conditional_expression], with the conditional_expression being a statement about the allowed values in a specific column.

For columns with non-numeric data (like Name or Country), the statement can only use the equal or unequal operator, as there is no order between the values. We could, for instance, return all rows of customers who are not from Asia:

df.loc[df['Region'] != 'Asia']

Customer ID

Name

Country

Region

Age

C123

John Doe

United States

North America

67

C234

Petra Müller

Germany

Europe

51

C456

Maria Gonzalez

Mexico

North America

26

Selecting a single column using .loc

To select columns, we need to specify the column_indexer argument, which comes after the row_indexer argument. If we want to only specify the column_indexer, we need to somehow mark that we want to return all rows and only filter on the columns. Let’s see how we can do it!

Selecting a single column can be done by specifying the column_indexerwith the label of the respective column. To retrieve all rows, we need to specify the row_indexer with a simple colon. We arrive at a syntax that looks like this: df.loc[:, 'column_name'].

Let’s display the Name of each customer:

df.loc[:, 'Name']

Customer ID

Name

C123

John Doe

C234

Petra Müller

C345

Ali Khan

C456

Maria Gonzalez

C567

David Lee

Selecting multiple columns using .loc

Similar to selecting multiple rows, we need to pass a list of column labels if we want to return multiple columns of a DataFrame that do not necessarily follow each other in order: df.loc[:, [col_label_1, 'col_label_2']].

Assuming we wanted to add all customers’ Age to our last output, it would work like this:

df.loc[:, ['Name', 'Age']]

Customer ID

Name

Age

C123

John Doe

67

C234

Petra Müller

51

C345

Ali Khan

19

C456

Maria Gonzalez

26

C567

David Lee

40

Selecting a slice of columns using .loc

Using a colon between the labels of two columns will select all columns in the order range between the two specified columns. It is inclusive of the end column, meaning the column named col_end will also be selected in the standard syntax, which is the following: df.loc[:, 'col_start':'col_end'].

If we were interested in the Name, Country, and Region of our customers, our code line could be:

df.loc[:, 'Name':'Region']

Customer ID

Name

Country

Region

C123

John Doe

United States

North America

C234

Petra Müller

Germany

Europe

C345

Ali Khan

Pakistan

Asia

C456

Maria Gonzalez

Mexico

North America

C567

David Lee

China

Asia

Combined row and column selection using .loc

It’s also possible to specify both the row_indexer and the column_indexer. This could be used to retrieve a single piece of information, meaning one cell from the DataFrame. To do this, we specify one row and one column using the syntax df.loc['row_label', 'column_name'] .

The more useful case is to return a sub-DataFrame that focuses on exactly the set of rows and columns we are interested in. It is possible to specify both indexers as lists using the square brackets, or as a slice using the colon, and even to combine it with a conditional expression for the row selection.

Here is one example of returning the Name, Country, and Region of each customer with an Age of over 30:

df.loc[df['Age'] > 30, 'Name':'Region']

Customer ID

Name

Country

Region

C123

John Doe

United States

North America

C234

Petra Müller

Germany

Europe

C567

David Lee

China

Asia

Using .iloc: Selection by Integer Position

.iloc selects by position instead of label. This is the standard syntax of using .iloc: df.iloc[row_indexer, column_indexer]. There are two special things to look out for:

  • Counting starting at 0: The first row and column have the index 0, the second one index 1, etc.
  • Exclusivity of range end value: When using a slice, the row or column specified behind the colon is not included in the selection.

Selecting a single row using .iloc

A single row can be selected by using the integer representing the row index number as the row_indexer. We don’t need quotation marks since we are entering an integer number and not a label string as we did with .loc. To return the first row of a DataFrame called df, enter df.iloc[0].

In our example DataFrame, this very code line returns the information of John Doe:

df.iloc[0]

C123

 

Name

John Doe

Country

United States

Region

North America

Age

67

Selecting multiple rows using .iloc

Selecting multiple rows works in .iloc as it does in .loc—we enter the row index integers in a list with squared brackets. The syntax looks like this: df.iloc[[0, 3, 4]].

The respective output in our customer table can be seen below:

df.iloc[[0, 3, 4]]

Customer ID

Name

Country

Region

Age

C123

John Doe

United States

North America

67

C456

Maria Gonzalez

Mexico

North America

26

C567

David Lee

China

Asia

40

Selecting a slice of rows using .iloc

For selecting a slice of rows, we use a colon between two specified row index integers. Now, we have to pay attention to the exclusivity mentioned earlier. 

We can take the line df.iloc[1:4] as an example to illustrate this concept. Index number 1 means the second row, so our slice starts there. The index integer 4 represents the fifth row – but since .iloc is not inclusive for slice selection, our output will include all rows up until the last before this one. Therefore, it will return the second, third, and fourth row. 

Let’s prove that the line works as it should:

df.iloc[1:4]

Customer ID

Name

Country

Region

Age

C234

Petra Müller

Germany

Europe

51

C345

Ali Khan

Pakistan

Asia

19

C456

Maria Gonzalez

Mexico

North America

26

Selecting a single column using .iloc

The logic of selecting columns using .iloc follows what we have learned so far. Let’s see how it works for single columns, multiple columns and column slices.

Just like with .loc, it is important to specify the row_indexer before we can proceed to the column_indexer. To retrieve the values of the third column of df for every row, we enter df.iloc[:, 2] .

Because Region is the third column in our DataFrame, it will be retrieved as a consequence of that code line:

df.iloc[:, 2]

Customer ID

Region

C123

North America

C234

Europe

C345

Asia

C456

North America

C567

Asia

Selecting multiple columns using .iloc

To select multiple columns that are not necessarily subsequent, we can again enter a list containing integers as the column_indexer. The line df.iloc[:, [0, 3]] returns both the first and fourth columns. 

In our case, the information displayed is the Name as well as the Age of each customer:

df.iloc[:, [0, 3]]

Customer ID

Name

Age

C123

John Doe

67

C234

Petra Müller

51

C345

Ali Khan

19

C456

Maria Gonzalez

26

C567

David Lee

40

Selecting a slice of columns using .iloc

For slice selection using .iloc, the logic of the column_indexer follows that of the row_indexer. The column represented by the integer after the colon is not included in the output. To retrieve the second and third columns, the code line should look like this: df.iloc[:, 1:3].

This line below returns all the geographical information we have about our customers:

df.iloc[:, 1:3]

Customer ID

Country

Region

C123

United States

North America

C234

Germany

Europe

C345

Pakistan

Asia

C456

Mexico

North America

C567

China

Asia

Combined row and column selection using .iloc

We can put together what we learned about .iloc to combine row and column selection. Again, it is possible to either return a single cell or a sub-DataFrame. To return the single cell at the intersection of row 3 and column 4, we enter df.iloc[2, 3].

Just like with .loc, we can specify both indexers as lists using the square brackets, or as a slice using the colon. If we want to select rows using conditional expressions, that is technically possible with .iloc as well, but not recommended. Using the label names and .loc is usually way more intuitive and less prone to errors.

This last example displays Country, Region and Age for the first, second and fifth row in our DataFrame:

df.iloc[[0,1,4], 1:4]

Customer ID

Country

Region

Age

C123

United States

North America

67

C234

Germany

Europe

51

C567

China

Asia

40

.iloc vs .loc: When to Use Which

Generally, there is one simple rule of thumb where the method choice depends on your knowledge of the DataFrame:

  • Use .loc when you know the labels (names) of the rows/columns.
  • Use .iloc when you know the integer positions of the rows/columns.

Some scenarios favor either .loc or .iloc by their nature. For example, iterating over rows or columns is easier and more intuitive using integers than labels. As we already mentioned, filtering rows based on conditions on column values is less prone to errors using the column label names.

Scenarios Favoring .loc

Scenarios Favoring .iloc

Your DataFrame has meaningful index/column names.

You're iterating over rows/columns by their position.

You need to filter based on conditions on column values.

The index/column names are not relevant to your task.

KeyError, NameError, and Index Error With .loc and .iloc

Let’s take a look at possible problems. A common pitfall when using .loc is encountering a KeyError. This error occurs when we attempt to access a row or column label that doesn't exist within our DataFrame. To avoid this, we always have to ensure that the labels we're using are accurate and that they match the existing labels in your DataFrame and to double-check for typos.

Additionally, it is important to always use quotation marks for the labels specified using .loc. Forgetting them will return a NameError.

An IndexError can occur when using .iloc if we specify an integer position that is outside the valid range of our DataFrame's indices. This happens when the index you're trying to access doesn't exist, either because it's beyond the number of rows or columns in your DataFrame or because it's a negative value. To prevent this error, check the dimensions of your DataFrame and use appropriate index values within the valid range.

Conclusion

I hope this blog has been helpful and the distinction between .loc and .iloc is clear by now. To learn more, here are some good next steps:


Photo of Tom Farnschläder
Author
Tom Farnschläder
LinkedIn

After building a solid base in economics, law, and accounting in my dual studies at the regional financial administration, I first got into contact with statistics in my social sciences studies and work as tutor. Performing quantitative empirical analyses, I discovered a passion that led me to continue my journey further into the beautiful field of data science and learn analytics tools such as R, SQL, and Python. Currently, I am enhancing my practical skills at Deutsche Telekom, where I am able to receive lots of hands-on experience in coding data paths to import, process, and analyze data using Python.

Topics

Learn Pandas with these courses!

course

Data Manipulation with pandas

4 hr
415.4K
Learn how to import and clean data, calculate statistics, and create visualizations with pandas.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat-sheet

Pandas Cheat Sheet for Data Science in Python

A quick guide to the basics of the Python data analysis library Pandas, including code samples.
Karlijn Willems's photo

Karlijn Willems

4 min

cheat-sheet

Pandas Cheat Sheet: Data Wrangling in Python

This cheat sheet is a quick reference for data wrangling with Pandas, complete with code samples.
Karlijn Willems's photo

Karlijn Willems

4 min

tutorial

Python Select Columns Tutorial

Use Python Pandas and select columns from DataFrames. Follow our tutorial with code examples and learn different ways to select your data today!
DataCamp Team's photo

DataCamp Team

7 min

tutorial

Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.
Karlijn Willems's photo

Karlijn Willems

20 min

tutorial

Pandas Sort Values Tutorial

Learn how to sort rows of data in a pandas Dataframe using the .sort_values() function.
DataCamp Team's photo

DataCamp Team

4 min

tutorial

pandas read_csv() Tutorial: Importing Data

Importing data is the first step in any data science project. Learn why today's data scientists prefer the pandas read_csv() function to do this.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

See MoreSee More