course
Iloc vs Loc in Pandas: A Guide With Examples
One of those annoying things that we’re all trying to figure out when we learn Pandas is the distinction between .loc
and .iloc
.
Let’s put an end to this confusion and clarify the difference between these two methods. I’ll give plenty of examples, and I hope the distinction will be much clearer by the end of this blog.
What Are .loc and .iloc in Pandas?
Both .loc
and .iloc
are essential attributes of Pandas DataFrames, and both are used for selecting specific subsets of data. Their purpose is to access and enable manipulating a specific part of the DataFrame instead of the whole DataFrame.
Feature |
.loc |
.iloc |
Syntax |
df.loc[row_indexer, column_indexer] |
df.iloc[row_indexer, column_indexer] |
Indexing Method |
Label-based indexing |
Position-based indexing |
Used for Reference |
Row and column labels (names) |
Numerical indices of rows and columns (starting from 0) |
As we can see from the table, the syntax looks very similar. The difference lies in how we use the row_indexer
and column_indexer
arguments. This is because the two methods offer different approaches to indexing the data: while .loc
indexes based on label names, .iloc
takes the numerical position index of rows and columns as arguments.
Let’s examine each of the two methods in detail, starting with .loc
.
Using .loc: Selection by Labels
To illustrate the concepts, let's consider a hypothetical customer database represented by this DataFrame called df
, with the Customer ID
representing the row index:
Customer ID |
Name |
Country |
Region |
Age |
C123 |
John Doe |
United States |
North America |
67 |
C234 |
Petra Müller |
Germany |
Europe |
51 |
C345 |
Ali Khan |
Pakistan |
Asia |
19 |
C456 |
Maria Gonzalez |
Mexico |
North America |
26 |
C567 |
David Lee |
China |
Asia |
40 |
There are four primary ways to select rows with .loc
. These include:
- Selecting a single row
- Selecting multiple rows
- Selecting a slice of rows
- Conditional row selection
Selecting a single row using .loc
To select a single row, we use the label of the row we want to retrieve as row_indexer
. Accordingly, the syntax looks like this: df.loc['row_label']
. Let’s use this to display all the information on our customer Ali Khan:
df.loc['C345']
C345 |
|
Name |
Ali Khan |
Country |
Pakistan |
Region |
Asia |
Age |
19 |
Selecting multiple rows using .loc
If we want to select multiple rows that do not necessarily follow each other in order, we have to pass a list of their row labels as the row_indexer
argument. This means we need to use not one but two pairs of square brackets: one for the regular .loc
syntax and one for the label list.
The line df.loc[['row_label_1', 'row_label_2']]
will return the two rows of the df
DataFrame specified in the list. Let’s say we wanted to know not only the information on Ali Khan but as well on David Lee:
df.loc[['C345', 'C567']]
Customer ID |
Name |
Country |
Region |
Age |
C345 |
Ali Khan |
Pakistan |
Asia |
19 |
C567 |
David Lee |
China |
Asia |
40 |
Selecting a slice of rows using .loc
We can select a range of rows by passing the first and last row labels with a colon in between: df.loc['row_label_start':'row_label_end']
. We could display the first four rows of our DataFrame like this:
df.loc['C123' : 'C456']
Customer ID |
Name |
Country |
Region |
Signup Date |
C123 |
John Doe |
United States |
North America |
67 |
C234 |
Petra Müller |
Germany |
Europe |
51 |
C345 |
Ali Khan |
Pakistan |
Asia |
19 |
C456 |
Maria Gonzalez |
Mexico |
North America |
26 |
There are two things to keep in mind here:
- The output includes the row specified in
row_label_end
. This is different in.iloc
, which we’ll cover later. - We only use one pair of square brackets, even though we want to retrieve multiple rows. We do not use a list to specify the various rows, so using two square brackets would return a
SyntaxError
.
Conditional selection of rows using .loc
We can also return rows based on a conditional expression. We can filter all rows by whether or not they fulfill a certain condition and only display the ones that do.
The corresponding syntax is df.loc[conditional_expression]
, with the conditional_expression
being a statement about the allowed values in a specific column.
For columns with non-numeric data (like Name
or Country
), the statement can only use the equal or unequal operator, as there is no order between the values. We could, for instance, return all rows of customers who are not from Asia:
df.loc[df['Region'] != 'Asia']
Customer ID |
Name |
Country |
Region |
Age |
C123 |
John Doe |
United States |
North America |
67 |
C234 |
Petra Müller |
Germany |
Europe |
51 |
C456 |
Maria Gonzalez |
Mexico |
North America |
26 |
Selecting a single column using .loc
To select columns, we need to specify the column_indexer
argument, which comes after the row_indexer
argument. If we want to only specify the column_indexer
, we need to somehow mark that we want to return all rows and only filter on the columns. Let’s see how we can do it!
Selecting a single column can be done by specifying the column_indexer
with the label of the respective column. To retrieve all rows, we need to specify the row_indexer
with a simple colon. We arrive at a syntax that looks like this: df.loc[:, 'column_name']
.
Let’s display the Name
of each customer:
df.loc[:, 'Name']
Customer ID |
Name |
C123 |
John Doe |
C234 |
Petra Müller |
C345 |
Ali Khan |
C456 |
Maria Gonzalez |
C567 |
David Lee |
Selecting multiple columns using .loc
Similar to selecting multiple rows, we need to pass a list of column labels if we want to return multiple columns of a DataFrame that do not necessarily follow each other in order: df.loc[:, [col_label_1, 'col_label_2']]
.
Assuming we wanted to add all customers’ Age
to our last output, it would work like this:
df.loc[:, ['Name', 'Age']]
Customer ID |
Name |
Age |
C123 |
John Doe |
67 |
C234 |
Petra Müller |
51 |
C345 |
Ali Khan |
19 |
C456 |
Maria Gonzalez |
26 |
C567 |
David Lee |
40 |
Selecting a slice of columns using .loc
Using a colon between the labels of two columns will select all columns in the order range between the two specified columns. It is inclusive of the end column, meaning the column named col_end
will also be selected in the standard syntax, which is the following: df.loc[:, 'col_start':'col_end']
.
If we were interested in the Name
, Country
, and Region
of our customers, our code line could be:
df.loc[:, 'Name':'Region']
Customer ID |
Name |
Country |
Region |
C123 |
John Doe |
United States |
North America |
C234 |
Petra Müller |
Germany |
Europe |
C345 |
Ali Khan |
Pakistan |
Asia |
C456 |
Maria Gonzalez |
Mexico |
North America |
C567 |
David Lee |
China |
Asia |
Combined row and column selection using .loc
It’s also possible to specify both the row_indexer
and the column_indexer
. This could be used to retrieve a single piece of information, meaning one cell from the DataFrame. To do this, we specify one row and one column using the syntax df.loc['row_label', 'column_name']
.
The more useful case is to return a sub-DataFrame that focuses on exactly the set of rows and columns we are interested in. It is possible to specify both indexers as lists using the square brackets, or as a slice using the colon, and even to combine it with a conditional expression for the row selection.
Here is one example of returning the Name
, Country
, and Region
of each customer with an Age
of over 30:
df.loc[df['Age'] > 30, 'Name':'Region']
Customer ID |
Name |
Country |
Region |
C123 |
John Doe |
United States |
North America |
C234 |
Petra Müller |
Germany |
Europe |
C567 |
David Lee |
China |
Asia |
Using .iloc: Selection by Integer Position
.iloc
selects by position instead of label. This is the standard syntax of using .iloc
: df.iloc[row_indexer, column_indexer]
. There are two special things to look out for:
- Counting starting at 0: The first row and column have the index 0, the second one index 1, etc.
- Exclusivity of range end value: When using a slice, the row or column specified behind the colon is not included in the selection.
Selecting a single row using .iloc
A single row can be selected by using the integer representing the row index number as the row_indexer
. We don’t need quotation marks since we are entering an integer number and not a label string as we did with .loc
. To return the first row of a DataFrame called df
, enter df.iloc[0]
.
In our example DataFrame, this very code line returns the information of John Doe:
df.iloc[0]
C123 |
|
Name |
John Doe |
Country |
United States |
Region |
North America |
Age |
67 |
Selecting multiple rows using .iloc
Selecting multiple rows works in .iloc
as it does in .loc
—we enter the row index integers in a list with squared brackets. The syntax looks like this: df.iloc[[0, 3, 4]]
.
The respective output in our customer table can be seen below:
df.iloc[[0, 3, 4]]
Customer ID |
Name |
Country |
Region |
Age |
C123 |
John Doe |
United States |
North America |
67 |
C456 |
Maria Gonzalez |
Mexico |
North America |
26 |
C567 |
David Lee |
China |
Asia |
40 |
Selecting a slice of rows using .iloc
For selecting a slice of rows, we use a colon between two specified row index integers. Now, we have to pay attention to the exclusivity mentioned earlier.
We can take the line df.iloc[1:4]
as an example to illustrate this concept. Index number 1 means the second row, so our slice starts there. The index integer 4 represents the fifth row – but since .iloc
is not inclusive for slice selection, our output will include all rows up until the last before this one. Therefore, it will return the second, third, and fourth row.
Let’s prove that the line works as it should:
df.iloc[1:4]
Customer ID |
Name |
Country |
Region |
Age |
C234 |
Petra Müller |
Germany |
Europe |
51 |
C345 |
Ali Khan |
Pakistan |
Asia |
19 |
C456 |
Maria Gonzalez |
Mexico |
North America |
26 |
Selecting a single column using .iloc
The logic of selecting columns using .iloc
follows what we have learned so far. Let’s see how it works for single columns, multiple columns and column slices.
Just like with .loc
, it is important to specify the row_indexer
before we can proceed to the column_indexer
. To retrieve the values of the third column of df
for every row, we enter df.iloc[:, 2]
.
Because Region
is the third column in our DataFrame, it will be retrieved as a consequence of that code line:
df.iloc[:, 2]
Customer ID |
Region |
C123 |
North America |
C234 |
Europe |
C345 |
Asia |
C456 |
North America |
C567 |
Asia |
Selecting multiple columns using .iloc
To select multiple columns that are not necessarily subsequent, we can again enter a list containing integers as the column_indexer
. The line df.iloc[:, [0, 3]]
returns both the first and fourth columns.
In our case, the information displayed is the Name
as well as the Age
of each customer:
df.iloc[:, [0, 3]]
Customer ID |
Name |
Age |
C123 |
John Doe |
67 |
C234 |
Petra Müller |
51 |
C345 |
Ali Khan |
19 |
C456 |
Maria Gonzalez |
26 |
C567 |
David Lee |
40 |
Selecting a slice of columns using .iloc
For slice selection using .iloc
, the logic of the column_indexer
follows that of the row_indexer
. The column represented by the integer after the colon is not included in the output. To retrieve the second and third columns, the code line should look like this: df.iloc[:, 1:3]
.
This line below returns all the geographical information we have about our customers:
df.iloc[:, 1:3]
Customer ID |
Country |
Region |
C123 |
United States |
North America |
C234 |
Germany |
Europe |
C345 |
Pakistan |
Asia |
C456 |
Mexico |
North America |
C567 |
China |
Asia |
Combined row and column selection using .iloc
We can put together what we learned about .iloc
to combine row and column selection. Again, it is possible to either return a single cell or a sub-DataFrame. To return the single cell at the intersection of row 3 and column 4, we enter df.iloc[2, 3]
.
Just like with .loc
, we can specify both indexers as lists using the square brackets, or as a slice using the colon. If we want to select rows using conditional expressions, that is technically possible with .iloc
as well, but not recommended. Using the label names and .loc
is usually way more intuitive and less prone to errors.
This last example displays Country
, Region
and Age
for the first, second and fifth row in our DataFrame:
df.iloc[[0,1,4], 1:4]
Customer ID |
Country |
Region |
Age |
C123 |
United States |
North America |
67 |
C234 |
Germany |
Europe |
51 |
C567 |
China |
Asia |
40 |
.iloc vs .loc: When to Use Which
Generally, there is one simple rule of thumb where the method choice depends on your knowledge of the DataFrame:
- Use
.loc
when you know the labels (names) of the rows/columns. - Use
.iloc
when you know the integer positions of the rows/columns.
Some scenarios favor either .loc
or .iloc
by their nature. For example, iterating over rows or columns is easier and more intuitive using integers than labels. As we already mentioned, filtering rows based on conditions on column values is less prone to errors using the column label names.
Scenarios Favoring .loc |
Scenarios Favoring .iloc |
Your DataFrame has meaningful index/column names. |
You're iterating over rows/columns by their position. |
You need to filter based on conditions on column values. |
The index/column names are not relevant to your task. |
KeyError, NameError, and Index Error With .loc and .iloc
Let’s take a look at possible problems. A common pitfall when using .loc
is encountering a KeyError
. This error occurs when we attempt to access a row or column label that doesn't exist within our DataFrame. To avoid this, we always have to ensure that the labels we're using are accurate and that they match the existing labels in your DataFrame and to double-check for typos.
Additionally, it is important to always use quotation marks for the labels specified using .loc
. Forgetting them will return a NameError
.
An IndexError
can occur when using .iloc
if we specify an integer position that is outside the valid range of our DataFrame's indices. This happens when the index you're trying to access doesn't exist, either because it's beyond the number of rows or columns in your DataFrame or because it's a negative value. To prevent this error, check the dimensions of your DataFrame and use appropriate index values within the valid range.
Conclusion
I hope this blog has been helpful and the distinction between .loc
and .iloc
is clear by now. To learn more, here are some good next steps:
After building a solid base in economics, law, and accounting in my dual studies at the regional financial administration, I first got into contact with statistics in my social sciences studies and work as tutor. Performing quantitative empirical analyses, I discovered a passion that led me to continue my journey further into the beautiful field of data science and learn analytics tools such as R, SQL, and Python. Currently, I am enhancing my practical skills at Deutsche Telekom, where I am able to receive lots of hands-on experience in coding data paths to import, process, and analyze data using Python.
Learn Pandas with these courses!
course
Analyzing Marketing Campaigns with pandas
course
Writing Efficient Code with pandas
cheat-sheet
Pandas Cheat Sheet for Data Science in Python
cheat-sheet
Pandas Cheat Sheet: Data Wrangling in Python
tutorial
Python Select Columns Tutorial
DataCamp Team
7 min
tutorial
Pandas Tutorial: DataFrames in Python
tutorial
Pandas Sort Values Tutorial
DataCamp Team
4 min
tutorial