Start Learning for Free

Join over 1,000,000 other Data Science learners and start one of our interactive tutorials today!

Topic python small

Pandas Tutorial: DataFrames in Python

October 21st, 2016 in Python

Next to Matplotlib and NumPy, Pandas is one of the most widely used Python libraries in data science. It is mainly used for data munging, and with good reason: it’s very powerful and flexible, among many other things. It makes the least sexy part of the "sexiest job of the 21st Century" a bit more pleasant.

Besides this, there's even a better thing about it!

The Pandas library has the broader goal of becoming the most powerful and flexible open source data analysis and manipulation tool available in any language.

That's all the more reason for you to get started on working with this library and its expressive data structures straight away!

(For more practice, try the first chapter of this course on Pandas Dataframes.)

Content

  1. How To Create a Pandas DataFrame
  2. How To Select an Index or Column From a DataFrame
  3. How To Add an Index, Row or Column to a DataFrame
  4. How To Delete Indices, Rows or Columns From a DataFrame
  5. How To Rename the Columns or Indices of a DataFrame
  6. How To Format the Data in Your DataFrame
  7. How To Create an Empty DataFrame
  8. Does Pandas Recognize Dates When Importing Data?
  9. When, Why and How You Should Reshape Your DataFrame
  10. How To Iterate Over a DataFrame
  11. How To Write a DataFrame to a File

One of these structures is the DataFrame. With this tutorial, DataCamp wants to address 11 of the most popular Pandas DataFrame questions so that you understand -and avoid- the doubts of the Pythonistas who have gone before you.

The Beginning: What Are Pandas Data Frames?

Before we start off, let’s have a brief recap of what data frames are.

For those who are familiar with R know the data frame as a way to store data in rectangular grids that can easily be overviewed. Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. This means that a data frame’s rows do not need to contain, but can contain, the same type of values: they can be numeric, character, logical, etc.

Data frames in Python are very similar: they come with the Pandas library, and they are defined as a two-dimensional labeled data structures with columns of potentially different types.

In general, you could say that the Pandas data frame consists of three main components: the data, the index, and the columns.

  1. Firstly, the DataFrame can contain data that is:
  • a Pandas DataFrame
  • a Pandas Series: a one-dimensional labeled array capable of holding any data type with axis labels or index. An example of a Series object is one column from a DataFrame.
  • a Numpy ndarray, which can be a record or structured
  • a two-dimensional ndarray
  • dictionaries of one-dimensional ndarrays, lists, dictionaries or Series.

Note that np.ndarray is the actual data type, while np.array() is a function to make arrays from other data structures.

Structured arrays allow users to manipulate the data by named fields: in the example below, a structured array of three tuples is created. The first element of each tuple will be called ‘foo’ and will be of type int, while the second element will be named ‘bar’ and will be a float.

Record arrays, on the other hand, expand the properties of structured arrays. They allow users to access fields of structured arrays by attribute rather than by index. You see below that the ‘foo’ values are accessed in the r2 record array.

An example:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wIiwic2FtcGxlIjoiIyBBIHN0cnVjdHVyZWQgYXJyYXlcbm15X2FycmF5ID0gbnAub25lcygzLCBkdHlwZT0oWygnZm9vJywgaW50KSwgKCdiYXInLCBmbG9hdCldKSlcbiMgUHJpbnQgdGhlIHN0cnVjdHVyZWQgYXJyYXlcbl9fX19fKG15X2FycmF5Wydmb28nXSlcblxuIyBBIHJlY29yZCBhcnJheVxubXlfYXJyYXkyID0gbXlfYXJyYXkudmlldyhucC5yZWNhcnJheSlcbiMgUHJpbnQgdGhlIHJlY29yZCBhcnJheVxuX19fX18obXlfYXJyYXkyLmZvbykiLCJzb2x1dGlvbiI6IiMgQSBzdHJ1Y3R1cmVkIGFycmF5XG5teV9hcnJheSA9IG5wLm9uZXMoMywgZHR5cGU9KFsoJ2ZvbycsIGludCksICgnYmFyJywgZmxvYXQpXSkpXG5wcmludChteV9hcnJheVsnZm9vJ10pXG5cbiMgQSByZWNvcmQgYXJyYXlcbm15X2FycmF5MiA9IG15X2FycmF5LnZpZXcobnAucmVjYXJyYXkpXG5wcmludChteV9hcnJheTIuZm9vKSIsInNjdCI6InByZWRlZl9tc2cxPVwiRGlkIHlvdSB1c2UgYHByaW50KClgIHRvIHByaW50IG91dCB5b3VyIGFycmF5P1wiXG5wcmVkZWZfbXNnMj1cIkRpZCB5b3UgdXNlIGBwcmludCgpYCB0byBwcmludCBvdXQgeW91ciBzZWNvbmQgYXJyYXk/XCJcbmRmX3ByaW50X21zZz1cIkRpZCB5b3UgdXNlIHByaW50IHRvIHByaW50IG91dCBgbXlfYXJyYXlbJ2ZvbyddYD9cIlxuZGZfcHJpbnRfbXNnMj1cIkRpZCB5b3UgdXNlIHByaW50IHRvIHByaW50IG91dCBgbXlfYXJyYXkyLmZvb2A/XCJcbnRlc3RfZnVuY3Rpb24oXG4gICAgXCJwcmludFwiLFxuICAgIDEsXG4gICAgbm90X2NhbGxlZF9tc2c9cHJlZGVmX21zZzEsXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2csXG4gICAgZG9fZXZhbD1UcnVlXG4pXG4jIFRlc3QgcHJpbnQgZGljdFxudGVzdF9mdW5jdGlvbihcbiAgICBcInByaW50XCIsXG4gICAgMixcbiAgICBub3RfY2FsbGVkX21zZz1wcmVkZWZfbXNnMixcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3ByaW50X21zZzIsXG4gICAgZG9fZXZhbD1UcnVlXG4pXG5zdWNjZXNzX21zZyhcIldlbGwgZG9uZSFcIikifQ==
  1. Besides the data that your DataFrame needs to contain, you can also specify the index and column names. The index, on the one hand, indicates the difference in rows, while the column names indicate the difference in columns. We will see later that these two components of the DataFrame are handy when you’re manipulating your data.

If you’re in doubt about Pandas DataFrames and how they differ from other data structures such as the NumPy array or a Series, you can watch the small presentation below:

Note that in this post, most of the times, the libraries that you need have already been loaded in. The Pandas library is imported in as pd, while the NumPy library is loaded as np. Remember that when you code in your own environment, you shouldn’t forget this import step!

Do you still remember how to do it?

import numpy as np
import pandas as pd

Awesome!

Now that there is no doubt in your mind about what data frames are, what they can do and how they differ from other structures, it’s time to plunge into your questions.

1. How To Create a Pandas DataFrame

Obviously, making your DataFrames is your first step in almost anything that you want to do when it comes to data munging in Python. Maybe you want to start from scratch to make a data frame, but you can also convert other data structures.

Note that the data inputted to the data frame can vary!

This section will only cover making a Pandas DataFrame from other data structures, such as NumPy arrays.

To read more on making empty dataframes that you can fill up with data later, go to question 7.

Among the many things that can serve as input to make a ‘DataFrame’, a NumPy ndarray is one of them. To make a data frame from a NumPy array, you can just pass it to the DataFrame() function in the data argument.
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGF0YSA9IG5wLmFycmF5KFtbJycsJ0NvbDEnLCdDb2wyJ10sXG4gICAgICAgICAgICAgICAgWydSb3cxJywxLDJdLFxuICAgICAgICAgICAgICAgIFsnUm93MicsMyw0XV0pXG4gICAgICAgICAgICAgICAgXG5wcmludChwZC5EYXRhRnJhbWUoZGF0YT1kYXRhWzE6LDE6XSxcbiAgICAgICAgICAgICAgICAgIGluZGV4PWRhdGFbMTosMF0sXG4gICAgICAgICAgICAgICAgICBjb2x1bW5zPWRhdGFbMCwxOl0pKSIsInNjdCI6InN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9

Note how the code chunks above select elements from the NumPy array to construct the DataFrame: you first select the values that are contained in the lists that start with Row1 and Row2, then you select the index or row numbers Row1 and Row2 and then the column names Col1 and Col2.

Do you still remember how subsetting works in 2D NumPy arrays? You first indicate the row that you want to look in for your data, then the column. Don’t forget that the indices start at 0! For the data, you go and look in the rows at index 1 to end and you select all elements that come after index 1. You end up with selecting 1, 2, 3 and 4.

This approach to making data frames will be the same for all the structures that DataFrame() can take on as input.

Try it out in the code chunk below:

Remember that the Pandas library has already been imported for you as pd.
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiIyBUYWtlIGEgMkQgYXJyYXkgYXMgaW5wdXQgdG8geW91ciBEYXRhRnJhbWUgXG5teV8yZGFycmF5ID0gbnAuYXJyYXkoW1sxLCAyLCAzXSwgWzQsIDUsIDZdXSlcbnByaW50KF9fX19fX19fX19fX19fX18pXG5cbiMgVGFrZSBhIGRpY3Rpb25hcnkgYXMgaW5wdXQgdG8geW91ciBEYXRhRnJhbWUgXG5teV9kaWN0ID0gezE6IFsnMScsICczJ10sIDI6IFsnMScsICcyJ10sIDM6IFsnMicsICc0J119XG5wcmludChfX19fX19fX19fX19fX19fKVxuXG4jIFRha2UgYSBEYXRhRnJhbWUgYXMgaW5wdXQgdG8geW91ciBEYXRhRnJhbWUgXG5teV9kZiA9IHBkLkRhdGFGcmFtZShkYXRhPVs0LDUsNiw3XSwgaW5kZXg9cmFuZ2UoMCw0KSwgY29sdW1ucz1bJ0EnXSlcbnByaW50KF9fX19fX19fX19fX19fX18pXG5cbiMgVGFrZSBhIFNlcmllcyBhcyBpbnB1dCB0byB5b3VyIERhdGFGcmFtZVxubXlfc2VyaWVzID0gcGQuU2VyaWVzKHtcIkJlbGdpdW1cIjpcIkJydXNzZWxzXCIsIFwiSW5kaWFcIjpcIk5ldyBEZWxoaVwiLCBcIlVuaXRlZCBLaW5nZG9tXCI6XCJMb25kb25cIiwgXCJVbml0ZWQgU3RhdGVzXCI6XCJXYXNoaW5ndG9uXCJ9KVxucHJpbnQoX19fX19fX19fX19fX19fXykiLCJzb2x1dGlvbiI6IiMgVGFrZSBhIDJEIGFycmF5IGFzIGlucHV0IHRvIHlvdXIgRGF0YUZyYW1lIFxubXlfMmRhcnJheSA9IG5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XV0pXG5wcmludChwZC5EYXRhRnJhbWUobXlfMmRhcnJheSkpXG5cbiMgVGFrZSBhIGRpY3Rpb25hcnkgYXMgaW5wdXQgdG8geW91ciBEYXRhRnJhbWUgXG5teV9kaWN0ID0gezE6IFsnMScsICczJ10sIDI6IFsnMScsICcyJ10sIDM6IFsnMicsICc0J119XG5wcmludChwZC5EYXRhRnJhbWUobXlfZGljdCkpXG5cbiMgVGFrZSBhIERhdGFGcmFtZSBhcyBpbnB1dCB0byB5b3VyIERhdGFGcmFtZSBcbm15X2RmID0gcGQuRGF0YUZyYW1lKGRhdGE9WzQsNSw2LDddLCBpbmRleD1yYW5nZSgwLDQpLCBjb2x1bW5zPVsnQSddKVxucHJpbnQocGQuRGF0YUZyYW1lKG15X2RmKSlcblxuIyBUYWtlIGEgU2VyaWVzIGFzIGlucHV0IHRvIHlvdXIgRGF0YUZyYW1lXG5teV9zZXJpZXMgPSBwZC5TZXJpZXMoe1wiVW5pdGVkIEtpbmdkb21cIjpcIkxvbmRvblwiLCBcIkluZGlhXCI6XCJOZXcgRGVsaGlcIiwgXCJVbml0ZWQgU3RhdGVzXCI6XCJXYXNoaW5ndG9uXCIsIFwiQmVsZ2l1bVwiOlwiQnJ1c3NlbHNcIn0pXG5wcmludChwZC5EYXRhRnJhbWUobXlfc2VyaWVzKSkiLCJzY3QiOiJwcmVkZWZfbXNnPVwiRGlkIHlvdSBwcmludCBhbGwgeW91ciBEYXRhRnJhbWVzP1wiXG5kZl9wcmludF9tc2c9XCJEaWQgeW91IGNyZWF0ZSBhIERhdGFGcmFtZSBvdXQgb2YgYG15XzJkYXJyYXlgIGFuZCBwdXQgaXQgaW4gYHByaW50KClgP1wiXG5kZl9wcmludF9tc2cyPVwiRGlkIHlvdSBjcmVhdGUgYSBEYXRhRnJhbWUgb3V0IG9mIGBteV9kaWN0YCBhbmQgcHV0IGl0IGluIGBwcmludCgpYD9cIlxuZGZfcHJpbnRfbXNnMz1cIkRpZCB5b3UgY3JlYXRlIGEgRGF0YUZyYW1lIG91dCBvZiBgbXlfZGZgIGFuZCBwdXQgaXQgaW4gYHByaW50KClgP1wiXG5kZl9wcmludF9tc2c0PVwiRGlkIHlvdSBjcmVhdGUgYSBEYXRhRnJhbWUgb3V0IG9mIGBteV9zZXJpZXNgIGFuZCBwdXQgaXQgaW4gYHByaW50KClgP1wiXG4jIFRlc3QgcHJpbnQgMmQgYXJyYXlcbnRlc3RfZnVuY3Rpb24oXG4gICAgXCJwcmludFwiLFxuICAgIDEsXG4gICAgbm90X2NhbGxlZF9tc2c9cHJlZGVmX21zZyxcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3ByaW50X21zZyxcbiAgICBkb19ldmFsPVRydWVcbilcbiMgVGVzdCBwcmludCBkaWN0XG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAyLFxuICAgIG5vdF9jYWxsZWRfbXNnPXByZWRlZl9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2cyLFxuICAgIGRvX2V2YWw9VHJ1ZVxuKVxuI1Rlc3QgcHJpbnQgRGF0YUZyYW1lXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAzLFxuICAgIG5vdF9jYWxsZWRfbXNnPXByZWRlZl9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2czLFxuICAgIGRvX2V2YWw9VHJ1ZVxuKVxuI1Rlc3QgcHJpbnQgU2VyaWVzXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICA0LFxuICAgIG5vdF9jYWxsZWRfbXNnPXByZWRlZl9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2c0LFxuICAgIGRvX2V2YWw9VHJ1ZVxuKVxuc3VjY2Vzc19tc2coXCJZb3UncmUgYXdlc29tZSBhdCBtYWtpbmcgRGF0YUZyYW1lcyFcIikifQ==

Note that the index of your Series (and DataFrame) contains the keys of the original dictionary, but that they are sorted: Belgium will be the index at 0, while United States will be the index at 3.

After you have created your data frame, you might want to know a little bit more about it. You can use the shape property or the len() function in combination with the .index property:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGYgPSBwZC5EYXRhRnJhbWUobnAuYXJyYXkoW1sxLCAyLCAzXSwgWzQsIDUsIDZdXSkpXG5cbiMgVXNlIHRoZSBgc2hhcGVgIHByb3BlcnR5XG5wcmludChfX19fX19fXylcblxuIyBPciB1c2UgdGhlIGBsZW4oKWAgZnVuY3Rpb24gd2l0aCB0aGUgYGluZGV4YCBwcm9wZXJ0eVxucHJpbnQoX19fX19fX19fX19fKSIsInNvbHV0aW9uIjoiZGYgPSBwZC5EYXRhRnJhbWUobnAuYXJyYXkoW1sxLCAyLCAzXSwgWzQsIDUsIDZdXSkpXG5cbiMgVXNlIHRoZSBgc2hhcGVgIHByb3BlcnR5XG5wcmludChkZi5zaGFwZSlcblxuIyBPciB1c2UgdGhlIGBsZW4oKWAgZnVuY3Rpb24gd2l0aCB0aGUgYGluZGV4YCBwcm9wZXJ0eVxucHJpbnQobGVuKGRmLmluZGV4KSkiLCJzY3QiOiJwcmVkZWZfbXNnPVwiRGlkIHlvdSBwcmludCB0aGUgc2hhcGUgYW5kIGxlbmd0aCBvZiB0aGUgRGF0YUZyYW1lIGluZGV4P1wiXG5kZl9zaGFwZV9tc2c9XCJEaWQgeW91IHBhc3MgYGRmLnNoYXBlYCB0byB0aGUgYHByaW50KClgIGZ1bmN0aW9uP1wiXG5kZl9sZW5fbXNnPVwiRGlkIHlvdSBwYXNzIGBsZW4oZGYuaW5kZXgpYCB0byB0aGUgYHByaW50KClgIGZ1bmN0aW9uP1wiXG4jVGVzdCBwcmludCBzaGFwZVxudGVzdF9mdW5jdGlvbihcbiAgICBcInByaW50XCIsXG4gICAgMSxcbiAgICBub3RfY2FsbGVkX21zZz1wcmVkZWZfbXNnLFxuICAgIGluY29ycmVjdF9tc2c9ZGZfc2hhcGVfbXNnLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbiNUZXN0IHByaW50IGxlbmluZGV4XG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAyLFxuICAgIG5vdF9jYWxsZWRfbXNnPXByZWRlZl9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9sZW5fbXNnLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbnN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lISBZb3UgaGF2ZSBzb21lIGRlY2VudCBpbmZvIG9uIHlvdXIgRGF0YUZyYW1lcy5cIikifQ==

Note how these two options give you slightly different information on your DataFrame: the shape property will give you the dimensions of your DataFrame. So you will get to know the width and the height of your DataFrame. On the other hand, the len() function, in combination with the index property, will only give you information on the height of your DataFrame.

This all is totally not extraordinary, though, as you explicitly give in the index property.

You could also use df[0].count() to get to know more about the height of your DataFrame, but this will exclude the NaN values (if there are any). That is why calling .count() on your DataFrame is not always the better option.

If you want more information on your DataFrame columns, you can always execute list(my_dataframe.columns.values). Try this out for yourself in the DataCamp Light block above!

Fundamental DataFrame Operations

Now that you have put your data in a more convenient Pandas DataFrame structure, it’s time to get to the real work!

This first section will guide you through the first steps of working with data frames in Python. It will cover the basic operations that you can do on your newly made DataFrame: adding, selecting, deleting, renaming, … You name it!

Later, you will need these operations to go and do the more advanced wizardry with Pandas DataFrames.

2. How To Select an Index or Column From a Pandas DataFrame

Before you start with adding, deleting and renaming the components of your DataFrame, you first need to know how you can select these elements.

So, how do you do this?

Well, in essence, selecting an index, column or value from your DataFrame isn’t that hard. It’s really very similar to what you see in other languages that are used for data analysis (and which you might already know!).

Let’s take R for example. You use the [,] notation to access the data frame’s values.

In Pandas DataFrames, this is not too much different.

Let’s say you have a DataFrame like this one

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

And you want to access the value that is at index 0, in column ‘A’.

Well, here are the various options that exist to get your value 1 back:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKSIsInNhbXBsZSI6IiMgVXNpbmcgYGlsb2NbXWBcbnByaW50KGRmLmlsb2NbMF1bMF0pXG5cbiMgVXNpbmcgYGxvY1tdYFxucHJpbnQoZGYubG9jWzBdWydBJ10pXG5cbiMgVXNpbmcgYGF0W11gXG5wcmludChkZi5hdFswLCdBJ10pXG5cbiMgVXNpbmcgYGlhdFtdYFxucHJpbnQoZGYuaWF0WzAsMF0pXG5cbiMgVXNpbmcgYGdldF92YWx1ZShpbmRleCwgY29sdW1uKWBcbnByaW50KGRmLmdldF92YWx1ZSgwLCAnQScpKSIsInNjdCI6InN1Y2Nlc3NfbXNnPVwiV29vaG9vISBZb3Ugc3VjY2Vzc2Z1bGx5IHNlbGVjdGVkIHRoZSB2YWx1ZSBgMWAgb3V0IG9mIHlvdXIgRGF0YUZyYW1lIVwiIn0=

The most important ones to remember are, without a doubt, loc and iloc. The subtle differences between these two will be discussed in the next sections.

Enough for now about selecting values from your DataFrame. What about selecting rows and columns?

In that case, you would use:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKSIsInNhbXBsZSI6IiMgVXNlIGBpbG9jW11gIHRvIHNlbGVjdCByb3cgYDBgXG5wcmludChkZi5pbG9jW19dKVxuXG4jIFVzZSBgbG9jW11gIHRvIHNlbGVjdCBjb2x1bW4gYCdBJ2BcbnByaW50KGRmLmxvY1s6LCdfJ10pIiwic29sdXRpb24iOiIjIFVzZSBgaWxvY1tdYCB0byBzZWxlY3QgYSByb3dcbnByaW50KGRmLmlsb2NbMF0pXG5cbiMgVXNlIGBsb2NbXWAgdG8gc2VsZWN0IGEgY29sdW1uXG5wcmludChkZi5sb2NbOiwnQSddKSIsInNjdCI6InByZWRlZl9tc2c9XCJEaWQgeW91IHByaW50IG91dCB5b3VyIHNlbGVjdGlvbiBvZiB0aGUgcm93IGFuZCBjb2x1bW4/XCJcbmRmX2lsb2NfbXNnPVwiRGlkIHlvdSBhZGQgYDBgIHRvIGBpbG9jYD9cIlxuZGZfbG9jX21zZz1cIkRpZCB5b3UgYWRkIGBBYCB0byBgbG9jYD9cIlxuI1Rlc3QgcHJpbnQgaWxvY1xudGVzdF9mdW5jdGlvbihcbiAgICBcInByaW50XCIsXG4gICAgMSxcbiAgICBub3RfY2FsbGVkX21zZz1wcmVkZWZfbXNnLFxuICAgIGluY29ycmVjdF9tc2c9ZGZfaWxvY19tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxuI1Rlc3QgcHJpbnQgbG9jXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAyLFxuICAgIG5vdF9jYWxsZWRfbXNnPXByZWRlZl9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9sb2NfbXNnLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbnN1Y2Nlc3NfbXNnKFwiRmFudGFzdGljIVwiKSJ9

For now, it suffices to know that you can either access the values by calling them by their label or by their position in the index or column. If you don’t see this, look again at the slight differences in the commands: one time, you see [0][0], the other time, you see [0,'A'] to retrieve your value 1.

3. How To Add an Index, Row or Column to a Pandas DataFrame

Now that you have learned how to select a value from a DataFrame, it’s time to get to the real work and add an index, row or column to it!

Adding an Index to a DataFrame

When you create a DataFrame, you have the option to add input to the ‘index’ argument to make sure that you have the index that you desire. When you don’t specify this, your DataFrame will have, by default, a numerically valued index that starts with 0 and continues until the last row of your DataFrame.

However, even when your index is specified for you automatically, you still have the power to re-use one of your columns and make it your index. You can easily do this by calling set_index() on your DataFrame.

Try it out below!
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XV0pLCBjb2x1bW5zPVsnQScsICdCJywgJ0MnXSkiLCJzYW1wbGUiOiIjIFByaW50IG91dCB5b3VyIERhdGFGcmFtZSBgZGZgIHRvIGNoZWNrIGl0IG91dFxucHJpbnQoX18pXG5cbiMgU2V0ICdDJyBhcyB0aGUgaW5kZXggb2YgeW91ciBEYXRhRnJhbWVcbmRmLl9fX19fXygnQycpIiwic29sdXRpb24iOiIjIFByaW50IG91dCB5b3VyIERhdGFGcmFtZSBgZGZgIHRvIGNoZWNrIGl0IG91dFxucHJpbnQoZGYpXG5cbiMgU2V0ICdDJyBhcyB0aGUgaW5kZXggb2YgeW91ciBEYXRhRnJhbWVcbmRmLnNldF9pbmRleCgnQycpIiwic2N0IjoiZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgYGRmYD9cIlxuaW5kZXhfbXNnPVwiRGlkIHlvdSBzZXQgYCdDJ2AgYXMgdGhlIGluZGV4IHdpdGggYHNldF9pbmRleCgpYD9cIlxuXG4jVGVzdCBwcmludCBgZGZgXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICBub3RfY2FsbGVkX21zZz1kZl9wcmludF9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxuXG4jVGVzdCBgc2V0X2luZGV4YFxudGVzdF9mdW5jdGlvbihcImRmLnNldF9pbmRleFwiLCBub3RfY2FsbGVkX21zZyA9IGluZGV4X21zZywgaW5jb3JyZWN0X21zZyA9IGluZGV4X21zZylcbnN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9

Adding Rows to a DataFrame

Before you can get to the solution, it’s first a good idea to grasp the concept of loc and how it differs from other indexing attributes such as .iloc and .ix:

  • loc works on labels of your index. This means that if you give in loc[2], you look for the values of your DataFrame that have an index labeled 2.
  • iloc works on the positions in your index. This means that if you give in iloc[2], you look for the values of your DataFrame that are at index ’2`.
  • ix is a more complex case: when the index is integer-based, you pass a label to ix. ix[2] then means that you’re looking in your DataFrame for values that have an index labeled 2. This is just like loc! However, if your index is not solely integer-based, ix will work with positions, just like iloc.
This all might seem very complicated. Let’s illustrate all of this with a small example:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBpbmRleD0gWzIsICdBJywgNF0sIGNvbHVtbnM9WzQ4LCA0OSwgNTBdKVxuXG4jIFBhc3MgYDJgIHRvIGBsb2NgXG5wcmludChkZi5sb2NbX10pXG5cbiMgUGFzcyBgMmAgdG8gYGlsb2NgXG5wcmludChkZi5pbG9jW19dKVxuXG4jIFBhc3MgYDJgIHRvIGBpeGBcbnByaW50KGRmLml4W19dKSIsInNvbHV0aW9uIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBpbmRleD0gWzIsICdBJywgNF0sIGNvbHVtbnM9WzQ4LCA0OSwgNTBdKVxuXG4jIFBhc3MgYDJgIHRvIGBsb2NgXG5wcmludChkZi5sb2NbMl0pXG5cbiMgUGFzcyBgMmAgdG8gYGlsb2NgXG5wcmludChkZi5pbG9jWzJdKVxuXG4jIFBhc3MgYDJgIHRvIGBpeGBcbnByaW50KGRmLml4WzJdKSIsInNjdCI6ImRmX3Jvd19tc2c9XCJEaWQgeW91IHByaW50IG91dCB0aGUgdmFsdWVzIGF0IGxhYmVsIGAyYD9cIlxuZGZfcm93X21zZzI9XCJEaWQgeW91IHByaW50IG91dCB0aGUgdmFsdWVzIGF0IHBvc2l0aW9uIGAyYD9cIlxuI1Rlc3QgcHJpbnQgYGxvY2BcbnRlc3RfZnVuY3Rpb24oXG4gICAgXCJwcmludFwiLFxuICAgIDEsXG4gICAgbm90X2NhbGxlZF9tc2c9ZGZfcm93X21zZyxcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3Jvd19tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxuI1Rlc3QgcHJpbnQgYGlsb2NgXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAyLFxuICAgIG5vdF9jYWxsZWRfbXNnPWRmX3Jvd19tc2cyLFxuICAgIGluY29ycmVjdF9tc2c9ZGZfcm93X21zZzIsXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxuI1Rlc3QgcHJpbnQgYGl4YFxudGVzdF9mdW5jdGlvbihcbiAgICBcInByaW50XCIsXG4gICAgMyxcbiAgICBub3RfY2FsbGVkX21zZz1kZl9yb3dfbXNnMixcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3Jvd19tc2cyLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbnN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9

Note that we here used an example of a DataFrame that is not solely integer-based as to make it easier for you to understand the differences. Int his case, you clearly see that passing 2 to loc or iloc/ix does not give back the same result!

  • We know that loc will go and look at the values that are at label 2. The result that you get back, will be

    48    1
    49    2
    50    3
  • We also know that iloc will go and look at the positions in the index. When you pass 2, you will get back:

    48    7
    49    8
    50    9
  • Since the index doesn’t only contain integers, ix will have the same behavior as iloc and look at the positions in the index. You will get back the same result as iloc.

Now that the difference between iloc, loc and ix is clear, you are ready to give adding rows to your DataFrame a go!

As a consequence of what has just been explained, you understand that the general recommendation is that you use .loc to insert rows in your DataFrame.

If you would use df.ix[], you might try to reference a numerically valued index with the index value and accidentally overwrite an existing row of your DataFrame.

You better avoid this!

Check out the difference once more in the DataFrame below:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGYgPSBwZC5fX19fX19fX18oZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBpbmRleD0gWzIuNSwgMTIuNiwgNC44XSwgY29sdW1ucz1bNDgsIDQ5LCA1MF0pXG5cbiMgVGhlcmUncyBubyBpbmRleCBsYWJlbGVkIGAyYCwgc28geW91IHdpbGwgY2hhbmdlIHRoZSBpbmRleCBhdCBwb3NpdGlvbiBgMmBcbmRmLml4WzJdID0gWzYwLCA1MCwgNDBdXG5wcmludChkZilcblxuIyBUaGlzIHdpbGwgbWFrZSBhbiBpbmRleCBsYWJlbGVkIGAyYCBhbmQgYWRkIHRoZSBuZXcgdmFsdWVzXG5kZi5sb2NbMl0gPSBbMTEsIDEyLCAxM11cbnByaW50KGRmKSIsInNvbHV0aW9uIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBpbmRleD0gWzIuNSwgMTIuNiwgNC44XSwgY29sdW1ucz1bNDgsIDQ5LCA1MF0pXG5cbiMgVGhlcmUncyBubyBpbmRleCBsYWJlbGVkIGAyYCwgc28geW91IHdpbGwgY2hhbmdlIHRoZSBpbmRleCBhdCBwb3NpdGlvbiBgMmBcbmRmLml4WzJdID0gWzYwLCA1MCwgNDBdXG5wcmludChkZilcblxuIyBUaGlzIHdpbGwgbWFrZSBhbiBpbmRleCBsYWJlbGVkIGAyYCBhbmQgYWRkIHRoZSBuZXcgdmFsdWVzXG5kZi5sb2NbMl0gPSBbMTEsIDEyLCAxM11cbnByaW50KGRmKSIsInNjdCI6ImRmX21zZz1cIkRpZCB5b3UgdXNlIGBwZC5EYXRhRnJhbWVgIHRvIGNyZWF0ZSB5b3VyIERhdGFGcmFtZT9cIlxudGVzdF9mdW5jdGlvbihcInBhbmRhcy5EYXRhRnJhbWVcIiwgbm90X2NhbGxlZF9tc2cgPSBkZl9tc2csIGluY29ycmVjdF9tc2cgPSBkZl9tc2cpXG5zdWNjZXNzX21zZyhcIkF3ZXNvbWUgam9iIVwiKSJ9

You can see why all of this can be confusing, right?

Adding a Column to Your DataFrame

In some cases, you want to make your index part of your DataFrame. You can easily do this by taking a column from your DataFrame or by referring to a column that you haven’t made yet and assigning it to the .index property, just like this:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGYgPSBwZC5fX19fX19fX18oZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBjb2x1bW5zPVsnQScsICdCJywgJ0MnXSlcblxuIyBVc2UgYC5pbmRleGBcbmRmWydEJ10gPSBkZi5pbmRleFxuXG4jIFByaW50IGBkZmBcbnByaW50KF9fKSIsInNvbHV0aW9uIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBjb2x1bW5zPVsnQScsICdCJywgJ0MnXSlcblxuIyBVc2UgYC5pbmRleGBcbmRmWydEJ10gPSBkZi5pbmRleFxuXG4jIFByaW50IGBkZmBcbnByaW50KGRmKSIsInNjdCI6ImRmX21zZz1cIkRpZCB5b3UgdXNlIGBwZC5EYXRhRnJhbWUoKWAgdG8gbWFrZSB5b3VyIERhdGFGcmFtZT9cIlxuZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBgZGZgP1wiXG50ZXN0X2Z1bmN0aW9uKFwicGFuZGFzLkRhdGFGcmFtZVwiLCBub3RfY2FsbGVkX21zZyA9IGRmX21zZywgaW5jb3JyZWN0X21zZyA9IGRmX21zZylcbiNUZXN0IHByaW50IGBkZmBcbnRlc3RfZnVuY3Rpb24oXG4gICAgXCJwcmludFwiLFxuICAgIG5vdF9jYWxsZWRfbXNnPWRmX3ByaW50X21zZyxcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3ByaW50X21zZyxcbiAgICBkb19ldmFsPUZhbHNlXG4pXG5zdWNjZXNzX21zZyhcIldlbGwgZG9uZSEgTm93IHlvdSBrbm93IGhvdyB0byBhZGQgYSBjb2x1bW4gdG8geW91ciBEYXRhRnJhbWUuXCIpIn0=

In other words, you tell your DataFrame that it should take column A as its index.

However, if you want to append columns to your DataFrame, you could also follow the same approach as adding an index to your DataFrame: you use loc or iloc.

Remember that you could consider a Series object much like a column of a DataFrame. In this case, we add a Series to an existing DataFrame with the help of loc:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IHBhbmRhcyBhcyBwZFxuZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT17MTogWycxJywgJzMnXSwgMjogWycxJywgJzInXSwgMzogWycyJywgJzQnXX0pIiwic2FtcGxlIjoiIyBTdHVkeSB0aGUgRGF0YUZyYW1lIGBkZmBcbnByaW50KF9fKVxuXG4jIEFwcGVuZCBhIGNvbHVtbiB0byBgZGZgXG5kZi5sb2NbOiwgNF0gPSBwZC5TZXJpZXMoWyc1JywgJzYnXSwgaW5kZXg9ZGYuaW5kZXgpXG5cbiMgUHJpbnQgb3V0IGBkZmAgYWdhaW4gdG8gc2VlIHRoZSBjaGFuZ2VzXG5fX19fXyhfXykiLCJzb2x1dGlvbiI6IiMgU3R1ZHkgdGhlIERhdGFGcmFtZSBgZGZgXG5wcmludChkZilcblxuIyBBcHBlbmQgYSBjb2x1bW4gdG8gYGRmYFxuZGYubG9jWzosIDRdID0gcGQuU2VyaWVzKFsnNScsICc2J10sIGluZGV4PWRmLmluZGV4KVxuXG4jIFByaW50IG91dCBgZGZgIGFnYWluIHRvIHNlZSB0aGUgY2hhbmdlc1xucHJpbnQoZGYpIiwic2N0IjoiZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgP1wiXG5kZl9wcmludF9tc2cyPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgIGEgc2Vjb25kIHRpbWUgdG9vP1wiXG4jVGVzdCBwcmludCBgZGZgXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAxLFxuICAgIG5vdF9jYWxsZWRfbXNnPWRmX3ByaW50X21zZyxcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3ByaW50X21zZyxcbiAgICBkb19ldmFsPUZhbHNlXG4pXG4jVGVzdCBwcmludCBgZGZgIChzZWNvbmQpXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICAyLFxuICAgIG5vdF9jYWxsZWRfbXNnPWRmX3ByaW50X21zZzIsXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2cyLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbnN1Y2Nlc3NfbXNnKFwiQXdlc29tZSBqb2IhIEFkZGluZyBhIGNvbHVtbiB0byB5b3VyIERhdGFGcmFtZSBob2xkcyBubyBzZWNyZXRzIGZvciB5b3UgYW55IG1vcmUuXCIpIn0=

Note that the observation that was made earlier about loc still stays valid also for when you’re adding columns to your DataFrame!

Resetting the Index of Your DataFrame

When your index doesn’t look entirely the way you want it to, you can opt to reset it. This can easily ben done with .reset_index().

Note that you can pass several arguments that can make or break the success of your reset:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGluZGV4PSBbMi41LCAxMi42LCA0LjhdLCBjb2x1bW5zPVs0OCwgNDksIDUwXSkiLCJzYW1wbGUiOiIjIENoZWNrIG91dCB0aGUgd2VpcmQgaW5kZXggb2YgeW91ciBkYXRhZnJhbWVcbnByaW50KGRmKVxuXG4jIFVzZSBgcmVzZXRfaW5kZXgoKWAgdG8gcmVzZXQgdGhlIHZhbHVlcy4gXG5kZi5fX19fX19fX19fKGxldmVsPTAsIGRyb3A9VHJ1ZSkiLCJzb2x1dGlvbiI6IiMgQ2hlY2sgb3V0IHRoZSB3ZWlyZCBpbmRleCBvZiB5b3VyIGRhdGFmcmFtZVxucHJpbnQoZGYpXG5cbiMgVXNlIGByZXNldF9pbmRleCgpYCB0byByZXNldCB0aGUgdmFsdWVzXG5kZi5yZXNldF9pbmRleChsZXZlbD0wLCBkcm9wPVRydWUpIiwic2N0IjoicmVzZXRfaW5kZXhfbXNnPVwiRGlkIHlvdSByZXNldCB0aGUgaW5kZXggd2l0aCBgcmVzZXRfaW5kZXgoKWA/XCJcbnRlc3RfZnVuY3Rpb24oXCJkZi5yZXNldF9pbmRleFwiLCBub3RfY2FsbGVkX21zZyA9IHJlc2V0X2luZGV4X21zZywgaW5jb3JyZWN0X21zZyA9IHJlc2V0X2luZGV4X21zZylcbnN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9

Now try replacing the drop argument by inplace in the code chunk above and see what happens!

Note how you use the drop argument to indicate that you want to get rid of the index that was there. If you would have used inplace, the original index with floats is added as an extra column to your DataFrame.

4. How to Delete Indices, Rows or Columns From a Pandas Data Frame

Now that you have seen how to select and add indices, rows, and columns to your DataFrame, it’s time to consider another use case: removing these three from your data structure.

Deleting an Index from Your DataFrame

If you want to remove the index from your DataFrame, you should reconsider.

Because DataFrames and Series always have an index.

What you can do is, for example:

  • resetting the index of your DataFrame (go back to the previous section to see how it is done) or
  • remove the index name, if there is any, by executing del df.index.name,
  • remove duplicate index values by resetting the index, dropping the duplicates of the index column that has been added to your DataFrame and reinstating that duplicateless column again as the index:
    eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XSwgWzQwLCA1MCwgNjBdLCBbMjMsIDM1LCAzN11dKSwgXG4gICAgICAgICAgICAgICAgICBpbmRleD0gWzIuNSwgMTIuNiwgNC44LCA0LjgsIDIuNV0sIFxuICAgICAgICAgICAgICAgICAgY29sdW1ucz1bNDgsIDQ5LCA1MF0pXG4gICAgICAgICAgICAgICAgICBcbmRmLl9fX19fX19fX19fLmRyb3BfZHVwbGljYXRlcyhzdWJzZXQ9J2luZGV4Jywga2VlcD0nbGFzdCcpLnNldF9pbmRleCgnaW5kZXgnKSIsInNvbHV0aW9uIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XSwgWzQwLCA1MCwgNjBdLCBbMjMsIDM1LCAzN11dKSwgXG4gICAgICAgICAgICAgICAgICBpbmRleD0gWzIuNSwgMTIuNiwgNC44LCA0LjgsIDIuNV0sIFxuICAgICAgICAgICAgICAgICAgY29sdW1ucz1bNDgsIDQ5LCA1MF0pXG4gICAgICAgICAgICAgICAgICBcbmRmLnJlc2V0X2luZGV4KCkuZHJvcF9kdXBsaWNhdGVzKHN1YnNldD0naW5kZXgnLCBrZWVwPSdsYXN0Jykuc2V0X2luZGV4KCdpbmRleCcpIiwic2N0IjoicmVzZXRfaW5kZXhfbXNnPVwiRGlkIHlvdSByZXNldCB0aGUgaW5kZXggd2l0aCBgcmVzZXRfaW5kZXgoKWA/XCJcbnRlc3RfZnVuY3Rpb24oXCJkZi5yZXNldF9pbmRleFwiLCBub3RfY2FsbGVkX21zZyA9IHJlc2V0X2luZGV4X21zZywgaW5jb3JyZWN0X21zZyA9IHJlc2V0X2luZGV4X21zZylcbnN1Y2Nlc3NfbXNnKFwiV29uZGVyZnVsIVwiKSJ9
  • and lastly, remove an index, and with it a row. This is elaborated in one of the next sections.

Now that you know how to remove an index from your DataFrame, you can go on to removing columns and rows!

Deleting a Column from Your DataFrame

To get rid of (a selection of) columns from your DataFrame, you can use the drop() method:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIFxuICAgICAgICAgICAgICAgICAgY29sdW1ucz1bJ0EnLCAnQicsICdDJ10pIiwic2FtcGxlIjoiIyBDaGVjayBvdXQgdGhlIERhdGFGcmFtZSBgZGZgXG5wcmludChfXylcblxuIyBEcm9wIHRoZSBjb2x1bW4gd2l0aCBsYWJlbCAnQScgICAgICAgICAgICAgICAgICBcbmRmLl9fX18oJ0EnLCBheGlzPTEsIGlucGxhY2U9VHJ1ZSlcblxuIyBEcm9wIHRoZSBjb2x1bW4gYXQgcG9zaXRpb24gMVxuZGYuX19fXyhkZi5jb2x1bW5zW1sxXV0sIGF4aXM9MSkiLCJzb2x1dGlvbiI6IiMgQ2hlY2sgb3V0IHRoZSBEYXRhRnJhbWUgYGRmYFxucHJpbnQoZGYpXG5cbiMgRHJvcCB0aGUgY29sdW1uIHdpdGggbGFiZWwgJ0EnICAgICAgICAgICAgICAgICAgXG5kZi5kcm9wKCdBJywgYXhpcz0xLCBpbnBsYWNlPVRydWUpXG5cbiMgRHJvcCB0aGUgY29sdW1uIGF0IHBvc2l0aW9uIDFcbmRmLmRyb3AoZGYuY29sdW1uc1tbMV1dLCBheGlzPTEpIiwic2N0IjoiZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgP1wiXG5kcm9wX2NvbF9tc2c9XCJEaWQgeW91IGRyb3AgdGhlIGNvbHVtbiB3aXRoIGBkcm9wKClgP1wiXG5kcm9wX2NvbF9tc2cyPVwiRGlkIHlvdSBkcm9wIHRoZSBjb2x1bW4gd2l0aCBgZHJvcCgpYCBhIHNlY29uZCB0aW1lP1wiXG4jVGVzdCBwcmludCBgZGZgXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICBub3RfY2FsbGVkX21zZz1kZl9wcmludF9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxudGVzdF9mdW5jdGlvbihcImRmLmRyb3BcIiwgMSwgbm90X2NhbGxlZF9tc2cgPSBkcm9wX2NvbF9tc2csIGluY29ycmVjdF9tc2cgPSBkcm9wX2NvbF9tc2cpXG50ZXN0X2Z1bmN0aW9uKFwiZGYuZHJvcFwiLCAyLCBub3RfY2FsbGVkX21zZyA9IGRyb3BfY29sX21zZzIsIGluY29ycmVjdF9tc2cgPSBkcm9wX2NvbF9tc2cyKVxuc3VjY2Vzc19tc2coXCJGYW50YXN0aWMhXCIpIn0=

You might think now: well, this is not so straightforward; There are some extra arguments that are passsed to the drop() method!

  • The axis argument is either 0 when it indicates rows and 1 when it is used to drop columns.
  • You can set inplace to True to delete the column without having to reassign the DataFrame.

Note that you can also delete duplicate values from column with drop_duplicates():

Removing a Row from Your DataFrame

You can remove duplicate rows from your DataFrame by executing df.drop_duplicates(). You can also remove rows from your DataFrame, taking into account only the duplicate values that exist in one column.

Check out this example:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgMywgNF0sIFs0LCA1LCA2LCA1XSwgWzcsIDgsIDksIDZdLCBbMjMsIDUwLCA2MCwgN10sIFsyMywgMzUsIDM3LCAyM11dKSwgXG4gICAgICAgICAgICAgICAgICBpbmRleD0gWzIuNSwgMTIuNiwgNC44LCA0LjgsIDIuNV0sIFxuICAgICAgICAgICAgICAgICAgY29sdW1ucz1bNDgsIDQ5LCA1MCwgNTBdKSIsInNhbXBsZSI6IiMgQ2hlY2sgb3V0IHlvdXIgRGF0YUZyYW1lIGBkZmBcbnByaW50KF9fKVxuXG4jIERyb3AgdGhlIGR1cGxpY2F0ZXMgaW4gYGRmYFxuZGYuX19fX19fX18oWzQ4XSwga2VlcD0nbGFzdCcpIiwic29sdXRpb24iOiIjIENoZWNrIG91dCB5b3VyIERhdGFGcmFtZSBgZGZgXG5wcmludChkZilcblxuIyBEcm9wIHRoZSBkdXBsaWNhdGVzIGluIGBkZmBcbmRmLmRyb3BfZHVwbGljYXRlcyhbNDhdLCBrZWVwPSdsYXN0JykiLCJzY3QiOiJkZl9wcmludF9tc2c9XCJEaWQgeW91IHByaW50IG91dCB0aGUgRGF0YUZyYW1lIGBkZmA/XCJcbmRyb3BfZHVwX21zZz1cIkRpZCB5b3UgdXNlIGBkcm9wX2R1cGxpY2F0ZXNgIHRvIHJlbW92ZSB0aGUgZHVwbGljYXRlcyBmcm9tIGBkZmA/XCJcbnRlc3RfZnVuY3Rpb24oXG4gICAgXCJwcmludFwiLFxuICAgIG5vdF9jYWxsZWRfbXNnPWRmX3ByaW50X21zZyxcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3ByaW50X21zZyxcbiAgICBkb19ldmFsPUZhbHNlXG4pXG50ZXN0X2Z1bmN0aW9uKFwiZGYuZHJvcF9kdXBsaWNhdGVzXCIsIG5vdF9jYWxsZWRfbXNnID0gZHJvcF9kdXBfbXNnLCBpbmNvcnJlY3RfbXNnID0gZHJvcF9kdXBfbXNnKVxuc3VjY2Vzc19tc2c9XCJZb3UgcmVtb3ZlZCBhbGwgdGhlIGR1cGxpY2F0ZXMgZnJvbSB0aGUgcm93cyBzdWNjZXNzZnVsbHkhXCIifQ==
If there is no uniqueness criterion to the deletion that you want to perform, you can use the drop() method, where you use the index property to specify the index of which rows you want to remove from your DataFrame:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKSIsInNhbXBsZSI6IiMgQ2hlY2sgb3V0IHRoZSBEYXRhRnJhbWUgYGRmYFxucHJpbnQoX18pXG5cbiMgRHJvcCB0aGUgaW5kZXggYXQgcG9zaXRpb24gMVxuZGYuX19fXyhkZi5pbmRleFtfXSkiLCJzb2x1dGlvbiI6IiMgQ2hlY2sgb3V0IHRoZSBEYXRhRnJhbWUgYGRmYFxucHJpbnQoZGYpXG5cbiMgRHJvcCB0aGUgaW5kZXggYXQgcG9zaXRpb24gMVxucHJpbnQoZGYuZHJvcChkZi5pbmRleFsxXSkpIiwic2N0IjoiZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgP1wiXG5kcm9wX3Jvd19tc2c9XCJEaWQgeW91IHVzZSBgZHJvcCgpYCB0byByZW1vdmUgdGhlIHJvdyBhdCBpbmRleCAxIGFuZCBkaWQgeW91IHBhc3MgYDFgIHRvIGBkZi5pbmRleGA/XCJcbnRlc3RfZnVuY3Rpb24oXG4gICAgXCJwcmludFwiLFxuICAgIG5vdF9jYWxsZWRfbXNnPWRmX3ByaW50X21zZyxcbiAgICBpbmNvcnJlY3RfbXNnPWRmX3ByaW50X21zZyxcbiAgICBkb19ldmFsPUZhbHNlXG4pXG50ZXN0X2Z1bmN0aW9uKFwiZGYuZHJvcFwiLCBub3RfY2FsbGVkX21zZyA9IGRyb3Bfcm93X21zZywgaW5jb3JyZWN0X21zZyA9IGRyb3Bfcm93X21zZylcbnN1Y2Nlc3NfbXNnPVwiU3VjY2VzcyEgWW91IGRyb3BwZWQgdGhlIHJvd3MgYXQgaW5kZXggMSFcIiJ9

After this command, you might want to reset the index again.

Tip: Try resetting the index of the resulting DataFrame for yourself! Don’t forget to use the drop argument if you deem it necessary.

5. How to Rename the Index or Columns of a Pandas DataFrame

To give the columns or your index values of your dataframe a different value, it’s best to use the .rename() method.
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKSIsInNhbXBsZSI6IiMgQ2hlY2sgb3V0IHlvdXIgRGF0YUZyYW1lIGBkZmBcbnByaW50KF9fKVxuXG4jIERlZmluZSB0aGUgbmV3IG5hbWVzIG9mIHlvdXIgY29sdW1uc1xubmV3Y29scyA9IHtcbiAgICAnQSc6ICduZXdfY29sdW1uXzEnLCBcbiAgICAnQic6ICduZXdfY29sdW1uXzInLCBcbiAgICAnQyc6ICduZXdfY29sdW1uXzMnXG59XG5cbiMgVXNlIGByZW5hbWUoKWAgdG8gcmVuYW1lIHlvdXIgY29sdW1uc1xuZGYuX19fX19fKGNvbHVtbnM9bmV3Y29scywgaW5wbGFjZT1UcnVlKVxuXG4jIFVzZSBgcmVuYW1lKClgIHRvIHlvdXIgaW5kZXhcbmRmLl9fX19fXyhpbmRleD17MTogJ2EnfSkiLCJzb2x1dGlvbiI6IiMgQ2hlY2sgb3V0IHlvdXIgRGF0YUZyYW1lIGBkZmBcbnByaW50KGRmKVxuXG4jIERlZmluZSB0aGUgbmV3IG5hbWVzIG9mIHlvdXIgY29sdW1uc1xubmV3Y29scyA9IHtcbiAgICAnQSc6ICduZXdfY29sdW1uXzEnLCBcbiAgICAnQic6ICduZXdfY29sdW1uXzInLCBcbiAgICAnQyc6ICduZXdfY29sdW1uXzMnXG59XG5cbiMgVXNlIGByZW5hbWUoKWAgdG8gcmVuYW1lIHlvdXIgY29sdW1uc1xuZGYucmVuYW1lKGNvbHVtbnM9bmV3Y29scywgaW5wbGFjZT1UcnVlKVxuXG4jIFJlbmFtZSB5b3VyIGluZGV4XG5kZi5yZW5hbWUoaW5kZXg9ezE6ICdhJ30pIiwic2N0IjoiZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgP1wiXG5yZW5hbWVfY29sc19tc2c9XCJEaWQgeW91IHVzZSBgcmVuYW1lKClgIHRvIHJlbmFtZSB5b3VyIGNvbHVtbnM/XCJcbnJlbmFtZV9pbmRleF9tc2c9XCJEaWQgeW91IHVzZSBgcmVuYW1lKClgIHRvIHJlbmFtZSB0aGUgaW5kZXggYXMgd2VsbD9cIlxudGVzdF9mdW5jdGlvbihcbiAgICBcInByaW50XCIsXG4gICAgbm90X2NhbGxlZF9tc2c9ZGZfcHJpbnRfbXNnLFxuICAgIGluY29ycmVjdF9tc2c9ZGZfcHJpbnRfbXNnLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbnRlc3RfZnVuY3Rpb24oXCJkZi5yZW5hbWVcIiwgMSwgbm90X2NhbGxlZF9tc2cgPSByZW5hbWVfY29sc19tc2csIGluY29ycmVjdF9tc2cgPSByZW5hbWVfY29sc19tc2cpXG50ZXN0X2Z1bmN0aW9uKFwiZGYucmVuYW1lXCIsIDIsIG5vdF9jYWxsZWRfbXNnID0gcmVuYW1lX2luZGV4X21zZywgaW5jb3JyZWN0X21zZyA9IHJlbmFtZV9pbmRleF9tc2cpXG5zdWNjZXNzX21zZz1cIlN1Y2Nlc3MhIFlvdSBkcm9wcGVkIHRoZSByb3dzIGF0IGluZGV4IDEhXCIifQ==

Tip: try changing the inplace argument in the first task (renaming your columns) to False and see what the script now renders as a result. You see that now the DataFrame hasn’t been reassigned when renaming the columns. As a result, the second task takes the original DataFrame as input and not the one that you just got back from the first rename() operation.

Beyond The Pandas DataFrame Basics

Now that your first questions about Pandas’ DataFrames have been addressed, it’s time to go beyond the basics and get our hands dirty for real. Because there is far more to DataFrames than what you have seen in the first section.

6. How To Format The Data in Your Pandas DataFrame

Most of the times, you will also want to be able to do some operations on the actual values that are in your DataFrame.

Keep on reading to find out what the most common Pandas questions are when it comes to formatting your DataFrame’s values!

Replacing All Occurrences of a String in a DataFrame

To replace certain Strings in your DataFrame, you can easily use replace(): pass the values that you would like to change, followed by the values you want to replace them by.

Just like this:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbJ09LJywgJ1BlcmZlY3QnLCAnQWNjZXB0YWJsZSddLCBbJ0F3ZnVsJywgJ0F3ZnVsJywgJ1BlcmZlY3QnXSwgWydBY2NlcHRhYmxlJywgJ09LJywgJ1Bvb3InXV0pLFxuICAgICAgICAgICAgICAgICAgY29sdW1ucyA9IFsnU3R1ZGVudDEnLCAnU3R1ZGVudDInLCAnU3R1ZGVudDMnXSkiLCJzYW1wbGUiOiIjIFN0dWR5IHRoZSBEYXRhRnJhbWUgYGRmYCBmaXJzdFxuX19fX18oZGYpXG5cbiMgUmVwbGFjZSB0aGUgc3RyaW5ncyBieSBudW1lcmljYWwgdmFsdWVzICgwLTQpXG5kZi5fX19fX19fKFsnQXdmdWwnLCAnUG9vcicsICdPSycsICdBY2NlcHRhYmxlJywgJ1BlcmZlY3QnXSwgWzAsIDEsIDIsIDMsIDRdKSAiLCJzb2x1dGlvbiI6IiMgU3R1ZHkgdGhlIERhdGFGcmFtZSBgZGZgIGZpcnN0XG5wcmludChkZilcblxuIyBSZXBsYWNlIHRoZSBzdHJpbmdzIGJ5IG51bWVyaWNhbCB2YWx1ZXMgKDAtNClcbmRmLnJlcGxhY2UoWydBd2Z1bCcsICdQb29yJywgJ09LJywgJ0FjY2VwdGFibGUnLCAnUGVyZmVjdCddLCBbMCwgMSwgMiwgMywgNF0pICIsInNjdCI6InJlcGxhY2VfdmFsc19tc2c9XCJEaWQgeW91IHVzZSBgcmVwbGFjZSgpYCB0byByZXBsYWNlIHRoZSBzdHJpbmdzIHdpdGggbnVtZXJpY2FsIHZhbHVlcz9cIlxuZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgP1wiXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICBub3RfY2FsbGVkX21zZz1kZl9wcmludF9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxudGVzdF9mdW5jdGlvbihcImRmLnJlcGxhY2VcIiwgbm90X2NhbGxlZF9tc2cgPSByZXBsYWNlX3ZhbHNfbXNnLCBpbmNvcnJlY3RfbXNnID0gcmVwbGFjZV92YWxzX21zZylcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQgam9iIVwiKSJ9
Note that there is also a regex argument that can help you out tremendously when you’re faced with strange string combinations:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbJzFcXG4nLCAyLCAnM1xcbiddLCBbNCwgNSwgJzZcXG4nXSwgWzcsICc4XFxuJywgOV1dKSkiLCJzYW1wbGUiOiIjIENoZWNrIG91dCB5b3VyIERhdGFGcmFtZSBgZGZgXG5wcmludChkZilcblxuIyBSZXBsYWNlIHN0cmluZ3MgYnkgb3RoZXJzIHdpdGggYHJlZ2V4YFxuZGYucmVwbGFjZSh7J1xcbic6ICc8YnI+J30sIHJlZ2V4PVRydWUpIiwic29sdXRpb24iOiIjIENoZWNrIG91dCB5b3VyIERhdGFGcmFtZSBgZGZgXG5wcmludChkZilcblxuIyBSZXBsYWNlIHN0cmluZ3MgYnkgb3RoZXJzIHdpdGggYHJlZ2V4YFxuZGYucmVwbGFjZSh7J1xcbic6ICc8YnI+J30sIHJlZ2V4PVRydWUpIiwic2N0Ijoic3VjY2Vzc19tc2coXCJXZWxsIGRvbmUhXCIpIn0=

In short, replace() is mostly what you need to deal with when you want to replace values or strings in your DataFrame by others.

Removing Parts From Strings in the Cells of Your DataFrame

Removing unwanted parts of strings is cumbersome work. Luckily, there is a solution in place!
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgJyszYiddLCBbNCwgNSwgJy02QiddLCBbNywgOCwgJys5QSddXSksIGNvbHVtbnM9WydjbGFzcycsICd0ZXN0JywgJ3Jlc3VsdCddKSIsInNhbXBsZSI6IiMgQ2hlY2sgb3V0IHlvdXIgRGF0YUZyYW1lXG5fX19fXyhkZilcblxuIyBEZWxldGUgdW53YW50ZWQgcGFydHMgZnJvbSB0aGUgc3RyaW5ncyBpbiB0aGUgYHJlc3VsdGAgY29sdW1uXG5kZlsncmVzdWx0J10gPSBkZlsncmVzdWx0J10ubWFwKGxhbWJkYSB4OiB4LmxzdHJpcCgnKy0nKS5yc3RyaXAoJ2FBYkJjQycpKVxuXG4jIENoZWNrIG91dCB0aGUgcmVzdWx0IGFnYWluXG5kZiIsInNvbHV0aW9uIjoiIyBDaGVjayBvdXQgeW91ciBEYXRhRnJhbWVcbnByaW50KGRmKVxuXG4jIERlbGV0ZSB1bndhbnRlZCBwYXJ0cyBmcm9tIHRoZSBzdHJpbmdzIGluIHRoZSBgcmVzdWx0YCBjb2x1bW5cbmRmWydyZXN1bHQnXSA9IGRmWydyZXN1bHQnXS5tYXAobGFtYmRhIHg6IHgubHN0cmlwKCcrLScpLnJzdHJpcCgnYUFiQmNDJykpXG5cbiMgQ2hlY2sgb3V0IHRoZSByZXN1bHQgYWdhaW5cbmRmIiwic2N0IjoiZGZfcHJpbnRfbXNnPVwiRGlkIHlvdSBwcmludCBvdXQgdGhlIERhdGFGcmFtZSBgZGZgP1wiXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICBub3RfY2FsbGVkX21zZz1kZl9wcmludF9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxuc3VjY2Vzc19tc2coXCJTZWUgaG93IHRoZSBTdHJpbmdzIGluIHRoZSBgcmVzdWx0c2AgY29sdW1uIGxvb2sgbXVjaCBjbGVhbmVyIG5vdyFcIikifQ==

You use map() on the column result to apply the lambda function over each element or element-wise of the column. The function in itself takes the string value and strips the + or - that’s located on the left, and also strips away any of the six aAbBcC on the right.

Splitting Text in a Column into Multiple Rows in a DataFrame

Splitting your text into multiple rows is quite complex. The next code chunk will walk you through the steps:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMzQsIDAsICcyMzo0NDo1NSddLCBbMjIsIDAsICc2Njo3Nzo4OCddLCBbMTksIDEsICc0Mzo2ODowNSA1NjozNDoxMiddXSksIGNvbHVtbnM9WydBZ2UnLCAnUGx1c09uZScsICdUaWNrZXQnXSkiLCJzYW1wbGUiOiIjIEluc3BlY3QgeW91ciBEYXRhRnJhbWUgYGRmYFxucHJpbnQoX18pXG5cbiMgU3BsaXQgb3V0IHRoZSB0d28gdmFsdWVzIGluIHRoZSB0aGlyZCByb3dcbiMgTWFrZSBpdCBhIFNlcmllc1xuIyBTdGFjayB0aGUgdmFsdWVzXG50aWNrZXRfc2VyaWVzID0gZGZbJ1RpY2tldCddLnN0ci5zcGxpdCgnICcpLmFwcGx5KHBkLlNlcmllcywgMSkuc3RhY2soKVxuXG4jIEdldCByaWQgb2YgdGhlIHN0YWNrOlxuIyBEcm9wIHRoZSBsZXZlbCB0byBsaW5lIHVwIHdpdGggdGhlIERhdGFGcmFtZVxudGlja2V0X3Nlcmllcy5pbmRleCA9IHRpY2tldF9zZXJpZXMuaW5kZXguZHJvcGxldmVsKC0xKVxuXG4jIE1ha2UgeW91ciBgdGlja2V0X3Nlcmllc2AgYSBkYXRhZnJhbWUgXG50aWNrZXRkZiA9IHBkLl9fX19fX19fKHRpY2tldF9zZXJpZXMpXG5cbiMgRGVsZXRlIHRoZSBgVGlja2V0YCBjb2x1bW4gZnJvbSB5b3VyIERhdGFGcmFtZVxuZGVsIGRmWydUaWNrZXQnXVxuXG4jIEpvaW4gdGhlIGB0aWNrZXRkZmAgRGF0YUZyYW1lIHRvIGBkZmBcbmRmLl9fX18odGlja2V0ZGVmKVxuXG4jIENoZWNrIG91dCB0aGUgbmV3IGBkZmBcbmRmIiwic29sdXRpb24iOiIjIEluc3BlY3QgeW91ciBEYXRhRnJhbWUgYGRmYFxucHJpbnQoZGYpXG5cbiMgU3BsaXQgb3V0IHRoZSB0d28gdmFsdWVzIGluIHRoZSB0aGlyZCByb3dcbiMgTWFrZSBpdCBhIFNlcmllc1xuIyBTdGFjayB0aGUgdmFsdWVzXG50aWNrZXRfc2VyaWVzID0gZGZbJ1RpY2tldCddLnN0ci5zcGxpdCgnICcpLmFwcGx5KHBkLlNlcmllcywgMSkuc3RhY2soKVxuXG4jIEdldCByaWQgb2YgdGhlIHN0YWNrOlxuIyBEcm9wIHRoZSBsZXZlbCB0byBsaW5lIHVwIHdpdGggdGhlIERhdGFGcmFtZVxudGlja2V0X3Nlcmllcy5pbmRleCA9IHRpY2tldF9zZXJpZXMuaW5kZXguZHJvcGxldmVsKC0xKVxuXG4jIE1ha2UgeW91ciBzZXJpZXMgYSBkYXRhZnJhbWUgXG50aWNrZXRkZiA9IHBkLkRhdGFGcmFtZSh0aWNrZXRfc2VyaWVzKVxuXG4jIERlbGV0ZSB0aGUgYFRpY2tldGAgY29sdW1uIGZyb20geW91ciBEYXRhRnJhbWVcbmRlbCBkZlsnVGlja2V0J11cblxuIyBKb2luIHRoZSB0aWNrZXQgRGF0YUZyYW1lIHRvIGBkZmBcbmRmLmpvaW4odGlja2V0ZGYpXG5cbiMgQ2hlY2sgb3V0IHRoZSBuZXcgYGRmYFxuZGYiLCJzY3QiOiJkZl9wcmludF9tc2c9XCJEaWQgeW91IHByaW50IG91dCB0aGUgRGF0YUZyYW1lIGBkZmA/XCJcbmRmX21zZz1cIkRpZCB5b3UgdXNlIGBwZC5EYXRhRnJhbWVgIHRvIG1ha2UgYSBEYXRhRnJhbWUgb2YgYHRpY2tldF9zZXJpZXNgP1wiXG5kZl9qb2luX21zZz1cIkRpZCB5b3UgdXNlIGBkZi5qb2luKClgIHRvIGpvaW4gdGhlIHR3byBEYXRhRnJhbWVzP1wiXG50ZXN0X2Z1bmN0aW9uKFxuICAgIFwicHJpbnRcIixcbiAgICBub3RfY2FsbGVkX21zZz1kZl9wcmludF9tc2csXG4gICAgaW5jb3JyZWN0X21zZz1kZl9wcmludF9tc2csXG4gICAgZG9fZXZhbD1GYWxzZVxuKVxudGVzdF9mdW5jdGlvbihcInBhbmRhcy5EYXRhRnJhbWVcIiwgbm90X2NhbGxlZF9tc2cgPSBkZl9tc2csIGluY29ycmVjdF9tc2cgPSBkZl9tc2cpXG50ZXN0X2Z1bmN0aW9uKFwiZGYuam9pblwiLCBub3RfY2FsbGVkX21zZyA9IGRmX2pvaW5fbXNnLCBpbmNvcnJlY3RfbXNnID0gZGZfam9pbl9tc2cpXG5zdWNjZXNzX21zZyhcIlRoYXQgd2FzIGEgbG90IG9mIHdvcmssIGJ1dCB5b3UgZGlkIGFtYXppbmchXCIpIn0=

In short, what you do is:

  • First, you inspect the DataFrame at hand. You see that the values in the last row and in the last column are a bit too long. It appears there are two tickets because a guest has taken a plus-one to the concert.
  • You take the Ticket column from the DataFrame df and strings on a space. This will make sure that the two tickets will end up in two separate rows in the end. Next, you take these four values (the four ticket numbers) and put them into a Series object:

          0         1
    0  23:44:55       NaN
    1  66:77:88       NaN
    2  43:68:05  56:34:12
    That still doesn’t seem quite right. You have NaN values in there! You have to stack the Series to make sure you don’t have any NaN values in the resulting Series.
  • Next, you see that your Series is stacked.

    0  0    23:44:55
    1  0    66:77:88
    2  0    43:68:05
       1    56:34:12

    That is not ideal either. That is why you drop the level to line up with the DataFrame:

    0    23:44:55
    1    66:77:88
    2    43:68:05
    2    56:34:12
    dtype: object
    That is what you’re looking for.
  • Transform your Series to a DataFrame to make sure you can join it back to your initial DataFrame. However, to avoid having any duplicates in your DataFrame, you can delete the original Ticket column.

Applying A Function to Your Pandas DataFrame’s Columns or Rows

You might want to adjust the data in your DataFrame by applying a function to it. Let’s begin answering this question by making our own lambda function:

doubler = lambda x: x*2
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKVxuZG91YmxlciA9IGxhbWJkYSB4OiB4KjIiLCJzYW1wbGUiOiIjIFN0dWR5IHRoZSBgZGZgIERhdGFGcmFtZVxuX19fX18oX18pXG5cbiMgQXBwbHkgdGhlIGBkb3VibGVyYCBmdW5jdGlvbiB0byB0aGUgYEFgIERhdGFGcmFtZSBjb2x1bW5cbmRmWydBJ10uYXBwbHkoZG91YmxlcikiLCJzb2x1dGlvbiI6IiMgU3R1ZHkgdGhlIGBkZmAgRGF0YUZyYW1lXG5wcmludChkZilcblxuIyBBcHBseSB0aGUgYGRvdWJsZXJgIGZ1bmN0aW9uIHRvIHRoZSBgQWAgRGF0YUZyYW1lIGNvbHVtblxuZGZbJ0EnXS5hcHBseShkb3VibGVyKSIsInNjdCI6ImRmX3ByaW50X21zZz1cIkRpZCB5b3UgcHJpbnQgb3V0IHRoZSBEYXRhRnJhbWUgYGRmYD9cIlxudGVzdF9mdW5jdGlvbihcbiAgICBcInByaW50XCIsXG4gICAgbm90X2NhbGxlZF9tc2c9ZGZfcHJpbnRfbXNnLFxuICAgIGluY29ycmVjdF9tc2c9ZGZfcHJpbnRfbXNnLFxuICAgIGRvX2V2YWw9RmFsc2VcbilcbnN1Y2Nlc3NfbXNnKFwiV29uZGVyZnVsbHkgZG9uZSFcIikifQ==

Note that you can also select the row of your DataFrame and apply the doubler lambda function to it. Remember that you can easily select a row from your DataFrame by using loc or iloc.

Then, you would execute something like this, depending on whether you want to select your index based on its position or based on its label:

df.loc[0].apply(doubler)

Note that the apply() function only applies the doubler function along the axis of your DataFrame. That means that either you target the index or the columns. Or, in other words, either a row or a column.

If, however, you want to apply it to each element or element-wise, you can make use of the map() function. You can just replace the apply() function in the code chunk above with map(). Don’t forget to still pass the doubler function to it to make sure you multiply the values by 2.

Let’s say you want to apply this doubling function not only to the A column of your DataFrame, but to the whole of it. In this case, you can use applymap() to apply the doubler function to every single element in the entire dataframe:

eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKVxuZG91YmxlciA9IGxhbWJkYSB4OiB4KjIiLCJzYW1wbGUiOiJkZi5hcHBseW1hcChkb3VibGVyKVxucHJpbnQoZGYpIiwic2N0Ijoic3VjY2Vzc19tc2coXCJHb29kIGpvYiFcIikifQ==
Note that in these cases, we have been working with lamba functions or anonymous functions that get created at runtime. However, you can also write your entire own function. For example:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5kZiA9IHBkLkRhdGFGcmFtZShkYXRhPW5wLmFycmF5KFtbMSwgMiwgM10sIFs0LCA1LCA2XSwgWzcsIDgsIDldXSksIGNvbHVtbnM9WydBJywgJ0InLCAnQyddKSIsInNhbXBsZSI6ImRlZiBkb3VibGVyKHgpOlxuICAgIGlmIHggJSAyID09IDA6XG4gICAgICAgIHJldHVybiB4XG4gICAgZWxzZTpcbiAgICAgICAgcmV0dXJuIHggKiAyXG5cbiMgVXNlIGBhcHBseW1hcCgpYCB0byBhcHBseSBgZG91YmxlcigpYCB0byB5b3VyIERhdGFGcmFtZVxuZGYuYXBwbHltYXAoZG91YmxlcilcblxuIyBDaGVjayB0aGUgRGF0YUZyYW1lXG5wcmludChkZikiLCJzY3QiOiJzdWNjZXNzX21zZyhcIkF3ZXNvbWUhXCIpIn0=

If you want more information on the flow of control in Python, you can always read up on it here.

7. How To Create an Empty DataFrame

The function that you will use is the Pandas Dataframe() function: it requires you to pass the data that you want to put in, the indices and the columns.

Remember that the data that is contained within the data frame doesn’t have to be homogenous.

There are several ways in which you can use this function to make an empty data frame.

Firstly, you can use numpy.nan to initialize your data frame with NaNs. Note that numpy.nan has type float.
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IHBhbmRhcyBhcyBwZFxuaW1wb3J0IG51bXB5IGFzIG5wIiwic2FtcGxlIjoiZGYgPSBwZC5EYXRhRnJhbWUobnAubmFuLCBpbmRleD1bMCwxLDIsM10sIGNvbHVtbnM9WydBJ10pXG5wcmludChkZikiLCJzY3QiOiJzdWNjZXNzX21zZyhcIkdyZWF0IGpvYiEgTm93IHlvdSBoYXZlIGFuIGVtcHR5IERhdGFGcmFtZS5cIikifQ==
Right now, the data type of the data frame is inferred by default: because numpy.nan has type float, the data frame will also contain values of type float. You can, however, also force the data frame to be of a certain type by adding the attribute dtype and filling in the desired type. Just like in this example:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IHBhbmRhcyBhcyBwZCIsInNhbXBsZSI6ImRmID0gcGQuRGF0YUZyYW1lKGluZGV4PXJhbmdlKDAsNCksY29sdW1ucz1bJ0EnXSwgZHR5cGU9J2Zsb2F0JylcbnByaW50KGRmKSIsInNjdCI6InN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9

Note that if you don’t specify the axis labels or index, they will be constructed from the input data based on common sense rules.

8. Does Pandas Recognize Dates When Importing Data?

Pandas can recognize it, but you need to help it a tiny bit: add the argument parse_dates when you’reading in data from, let’s say, a comma-separated value (CSV) file:

import pandas as pd
pd.read_csv('yourFile', parse_dates=True)

# or this option:
pd.read_csv('yourFile', parse_dates=['columnName'])

There are, however, always weird date-time formats.

(Honestly, who has never had this?)

In such cases, you can construct your own parser to deal with this. You could, for example, make a lambda function that takes your DateTime and controls it with a format string.

import pandas as pd
dateparser = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')

# Which makes your read command:
pd.read_csv(infile, parse_dates=['columnName'], date_parser=dateparse)

# Or combine two columns into a single DateTime column
pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

9. When, Why And How You Should Reshape Your Pandas DataFrame

Reshaping your DataFrame is basically transforming it so that the resulting structure makes it more suitable for your data analysis.

In other words, reshaping is not so much concerned with formatting the values that are contained within the DataFrame, but more about transforming the shape of it.

This answers the when and why. Now onto the how of reshaping your DataFrame.

There are three ways of reshaping that frequently raise questions with users: pivoting, stacking and unstacking and melting.

Keep on reading to find out more!

Pivotting Your DataFrame

You can use the pivot() function to create a new derived table out of your original one. When you use the function, you can pass three arguments:

  1. Values: this argument allows you to specify which values of your original DataFrame you want to see in your pivot table.
  2. Columns: whatever you pass to this argument will become a column in your resulting table.
  3. Index: whatever you pass to this argument will become an index in your resulting table.
eyJsYW5ndWFnZSI6InB5dGhvbiIsInNhbXBsZSI6IiMgSW1wb3J0IHBhbmRhc1xuaW1wb3J0IF9fX19fXyBhcyBwZFxuXG5wcm9kdWN0cyA9IHBkLkRhdGFGcmFtZSh7J2NhdGVnb3J5JzogWydDbGVhbmluZycsICdDbGVhbmluZycsICdFbnRlcnRhaW5tZW50JywgJ0VudGVydGFpbm1lbnQnLCAnVGVjaCcsICdUZWNoJ10sXG4gICAgICAgICAgICAgICAgICAgICAgICAnc3RvcmUnOiBbJ1dhbG1hcnQnLCAnRGlhJywgJ1dhbG1hcnQnLCAnRm5hYycsICdEaWEnLCdXYWxtYXJ0J10sXG4gICAgICAgICAgICAgICAgICAgICAgICAncHJpY2UnOlsxMS40MiwgMjMuNTAsIDE5Ljk5LCAxNS45NSwgNTUuNzUsIDExMS41NV0sXG4gICAgICAgICAgICAgICAgICAgICAgICAndGVzdHNjb3JlJzogWzQsIDMsIDUsIDcsIDUsIDhdfSlcblxuIyBVc2UgYHBpdm90KClgIHRvIHBpdm90IHRoZSBEYXRhRnJhbWVcbnBpdm90X3Byb2R1Y3RzID0gcHJvZHVjdHMuX19fX18oaW5kZXg9J2NhdGVnb3J5JywgY29sdW1ucz0nc3RvcmUnLCB2YWx1ZXM9J3ByaWNlJylcblxuIyBDaGVjayBvdXQgdGhlIHJlc3VsdFxucHJpbnQocGl2b3RfcHJvZHVjdHMpIiwic29sdXRpb24iOiIjIEltcG9ydCBwYW5kYXNcbmltcG9ydCBwYW5kYXMgYXMgcGRcblxuIyBDcmVhdGUgeW91ciBEYXRhRnJhbWVcbnByb2R1Y3RzID0gcGQuRGF0YUZyYW1lKHsnY2F0ZWdvcnknOiBbJ0NsZWFuaW5nJywgJ0NsZWFuaW5nJywgJ0VudGVydGFpbm1lbnQnLCAnRW50ZXJ0YWlubWVudCcsICdUZWNoJywgJ1RlY2gnXSxcbiAgICAgICAgJ3N0b3JlJzogWydXYWxtYXJ0JywgJ0RpYScsICdXYWxtYXJ0JywgJ0ZuYWMnLCAnRGlhJywnV2FsbWFydCddLFxuICAgICAgICAncHJpY2UnOlsxMS40MiwgMjMuNTAsIDE5Ljk5LCAxNS45NSwgNTUuNzUsIDExMS41NV0sXG4gICAgICAgICd0ZXN0c2NvcmUnOiBbNCwgMywgNSwgNywgNSwgOF19KVxuXG4jIFVzZSBgcGl2b3QoKWAgdG8gcGl2b3QgdGhlIERhdGFGcmFtZVxucGl2b3RfcHJvZHVjdHMgPSBwcm9kdWN0cy5waXZvdChpbmRleD0nY2F0ZWdvcnknLCBjb2x1bW5zPSdzdG9yZScsIHZhbHVlcz0ncHJpY2UnKVxuXG4jIENoZWNrIG91dCB0aGUgcmVzdWx0XG5wcmludChwaXZvdF9wcm9kdWN0cykiLCJzY3QiOiJwaXZvdF9tc2c9XCJEaWQgeW91IHVzZSBgcGl2b3QoKWAgdG8gcGl2b3QgdGhlIGBwcm9kdWN0c2AgRGF0YUZyYW1lP1wiXG5wcmVkZWZfbXNnPVwiRGlkIHlvdSBpbXBvcnQgdGhlIFBhbmRhcyBsaWJyYXJ5P1wiXG4jIEltcG9ydCBwYW5kYXMgbGlicmFyeVxudGVzdF9pbXBvcnQoXCJwYW5kYXNcIiwgc2FtZV9hcyA9IFRydWUsIG5vdF9pbXBvcnRlZF9tc2cgPSBwcmVkZWZfbXNnLCBpbmNvcnJlY3RfYXNfbXNnID0gcHJlZGVmX21zZylcbnRlc3RfZnVuY3Rpb24oXCJwcm9kdWN0cy5waXZvdFwiLCBub3RfY2FsbGVkX21zZyA9IHBpdm90X21zZywgaW5jb3JyZWN0X21zZyA9IHBpdm90X21zZylcbnN1Y2Nlc3NfbXNnKFwiWW91IG1hZGUgaXQhXCIpIn0=
When you don’t specifically fill in what values you expect to be present in your resulting table, you will pivot by multiple columns:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInNhbXBsZSI6IiMgSW1wb3J0IHRoZSBQYW5kYXMgbGlicmFyeVxuaW1wb3J0IF9fX19fXyBhcyBwZFxuXG4jIENvbnN0cnVjdCB0aGUgRGF0YUZyYW1lXG5wcm9kdWN0cyA9IHBkLkRhdGFGcmFtZSh7J2NhdGVnb3J5JzogWydDbGVhbmluZycsICdDbGVhbmluZycsICdFbnRlcnRhaW5tZW50JywgJ0VudGVydGFpbm1lbnQnLCAnVGVjaCcsICdUZWNoJ10sXG4gICAgICAgICAgICAgICAgICAgICAgICAnc3RvcmUnOiBbJ1dhbG1hcnQnLCAnRGlhJywgJ1dhbG1hcnQnLCAnRm5hYycsICdEaWEnLCdXYWxtYXJ0J10sXG4gICAgICAgICAgICAgICAgICAgICAgICAncHJpY2UnOlsxMS40MiwgMjMuNTAsIDE5Ljk5LCAxNS45NSwgNTUuNzUsIDExMS41NV0sXG4gICAgICAgICAgICAgICAgICAgICAgICAndGVzdHNjb3JlJzogWzQsIDMsIDUsIDcsIDUsIDhdfSlcblxuIyBVc2UgYHBpdm90KClgIHRvIHBpdm90IHlvdXIgRGF0YUZyYW1lXG5waXZvdF9wcm9kdWN0cyA9IHByb2R1Y3RzLl9fX19fKGluZGV4PSdjYXRlZ29yeScsIGNvbHVtbnM9J3N0b3JlJylcblxuIyBDaGVjayBvdXQgdGhlIHJlc3VsdHNcbnByaW50KHBpdm90X3Byb2R1Y3RzKSIsInNvbHV0aW9uIjoiIyBJbXBvcnQgdGhlIFBhbmRhcyBsaWJyYXJ5XG5pbXBvcnQgcGFuZGFzIGFzIHBkXG5cbiMgQ29uc3RydWN0IHRoZSBEYXRhRnJhbWVcbnByb2R1Y3RzID0gcGQuRGF0YUZyYW1lKHsnY2F0ZWdvcnknOiBbJ0NsZWFuaW5nJywgJ0NsZWFuaW5nJywgJ0VudGVydGFpbm1lbnQnLCAnRW50ZXJ0YWlubWVudCcsICdUZWNoJywgJ1RlY2gnXSxcbiAgICAgICAgICAgICAgICAgICAgICAgICdzdG9yZSc6IFsnV2FsbWFydCcsICdEaWEnLCAnV2FsbWFydCcsICdGbmFjJywgJ0RpYScsJ1dhbG1hcnQnXSxcbiAgICAgICAgICAgICAgICAgICAgICAgICdwcmljZSc6WzExLjQyLCAyMy41MCwgMTkuOTksIDE1Ljk1LCA1NS43NSwgMTExLjU1XSxcbiAgICAgICAgICAgICAgICAgICAgICAgICd0ZXN0c2NvcmUnOiBbNCwgMywgNSwgNywgNSwgOF19KVxuXG4jIFVzZSBgcGl2b3QoKWAgdG8gcGl2b3QgeW91ciBEYXRhRnJhbWVcbnBpdm90X3Byb2R1Y3RzID0gcHJvZHVjdHMucGl2b3QoaW5kZXg9J2NhdGVnb3J5JywgY29sdW1ucz0nc3RvcmUnKVxuXG4jIENoZWNrIG91dCB0aGUgcmVzdWx0c1xucHJpbnQocGl2b3RfcHJvZHVjdHMpIiwic2N0IjoicHJlZGVmX21zZz1cIkRpZCB5b3UgaW1wb3J0IHRoZSBQYW5kYXMgbGlicmFyeT9cIlxucGl2b3RfbXNnPVwiRGlkIHlvdSB1c2UgYHBpdm90KClgIHRvIHBpdm90IHRoZSBEYXRhRnJhbWU/XCJcbnRlc3RfaW1wb3J0KFwicGFuZGFzXCIsIHNhbWVfYXMgPSBUcnVlLCBub3RfaW1wb3J0ZWRfbXNnID0gcHJlZGVmX21zZywgaW5jb3JyZWN0X2FzX21zZyA9IHByZWRlZl9tc2cpXG50ZXN0X2Z1bmN0aW9uKFwicHJvZHVjdHMucGl2b3RcIiwgbm90X2NhbGxlZF9tc2cgPSBwaXZvdF9tc2csIGluY29ycmVjdF9tc2cgPSBwaXZvdF9tc2cpXG5zdWNjZXNzX21zZyhcIlBpdm90dGluZyBob2xkcyBubyBzZWNyZXRzIGZvciB5b3UgYW55IG1vcmUhIEJ1dCB3YWl0LCB0aGVyZSBpcyBtb3JlIHRvIGRpc2NvdmVyLi4uXCIpIn0=
Note that your data can not have rows with duplicate values for the columns that you specify. If this is not the case, you will get an error message. If you can’t ensure the uniqueness of your data, you will want to use the pivot_table method instead:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInNhbXBsZSI6IiMgSW1wb3J0IHRoZSBQYW5kYXMgbGlicmFyeVxuaW1wb3J0IF9fX19fXyBhcyBwZFxuXG4jIFlvdXIgRGF0YUZyYW1lXG5wcm9kdWN0cyA9IHBkLkRhdGFGcmFtZSh7J2NhdGVnb3J5JzogWydDbGVhbmluZycsICdDbGVhbmluZycsICdFbnRlcnRhaW5tZW50JywgJ0VudGVydGFpbm1lbnQnLCAnVGVjaCcsICdUZWNoJ10sXG4gICAgICAgICAgICAgICAgICAgICAgICAnc3RvcmUnOiBbJ1dhbG1hcnQnLCAnRGlhJywgJ1dhbG1hcnQnLCAnRm5hYycsICdEaWEnLCdXYWxtYXJ0J10sXG4gICAgICAgICAgICAgICAgICAgICAgICAncHJpY2UnOlsxMS40MiwgMjMuNTAsIDE5Ljk5LCAxNS45NSwgMTkuOTksIDExMS41NV0sXG4gICAgICAgICAgICAgICAgICAgICAgICAndGVzdHNjb3JlJzogWzQsIDMsIDUsIDcsIDUsIDhdfSlcblxuIyBQaXZvdCB5b3VyIGBwcm9kdWN0c2AgRGF0YUZyYW1lIHdpdGggYHBpdm90X3RhYmxlKClgXG5waXZvdF9wcm9kdWN0cyA9IHByb2R1Y3RzLl9fX19fX19fX19fKGluZGV4PSdjYXRlZ29yeScsIGNvbHVtbnM9J3N0b3JlJywgdmFsdWVzPSdwcmljZScsIGFnZ2Z1bmM9J21lYW4nKVxuXG4jIENoZWNrIG91dCB0aGUgcmVzdWx0c1xucHJpbnQocGl2b3RfcHJvZHVjdHMpIiwic29sdXRpb24iOiIjIEltcG9ydCB0aGUgUGFuZGFzIGxpYnJhcnlcbmltcG9ydCBwYW5kYXMgYXMgcGRcblxuIyBZb3VyIERhdGFGcmFtZVxucHJvZHVjdHMgPSBwZC5EYXRhRnJhbWUoeydjYXRlZ29yeSc6IFsnQ2xlYW5pbmcnLCAnQ2xlYW5pbmcnLCAnRW50ZXJ0YWlubWVudCcsICdFbnRlcnRhaW5tZW50JywgJ1RlY2gnLCAnVGVjaCddLFxuICAgICAgICAgICAgICAgICAgICAgICAgJ3N0b3JlJzogWydXYWxtYXJ0JywgJ0RpYScsICdXYWxtYXJ0JywgJ0ZuYWMnLCAnRGlhJywnV2FsbWFydCddLFxuICAgICAgICAgICAgICAgICAgICAgICAgJ3ByaWNlJzpbMTEuNDIsIDIzLjUwLCAxOS45OSwgMTUuOTUsIDE5Ljk5LCAxMTEuNTVdLFxuICAgICAgICAgICAgICAgICAgICAgICAgJ3Rlc3RzY29yZSc6IFs0LCAzLCA1LCA3LCA1LCA4XX0pXG5cbiMgUGl2b3QgeW91ciBgcHJvZHVjdHNgIERhdGFGcmFtZSB3aXRoIGBwaXZvdF90YWJsZSgpYFxucGl2b3RfcHJvZHVjdHMgPSBwcm9kdWN0cy5waXZvdF90YWJsZShpbmRleD0nY2F0ZWdvcnknLCBjb2x1bW5zPSdzdG9yZScsIHZhbHVlcz0ncHJpY2UnLCBhZ2dmdW5jPSdtZWFuJylcblxuIyBDaGVjayBvdXQgdGhlIHJlc3VsdHNcbnByaW50KHBpdm90X3Byb2R1Y3RzKSIsInNjdCI6InByZWRlZl9tc2c9XCJEaWQgeW91IGltcG9ydCB0aGUgUGFuZGFzIGxpYnJhcnk/XCJcbnBpdm90X21zZz1cIkRpZCB5b3UgdXNlIGBwaXZvdF90YWJsZSgpYCB0byBwaXZvdCB0aGUgRGF0YUZyYW1lP1wiXG50ZXN0X2ltcG9ydChcInBhbmRhc1wiLCBzYW1lX2FzID0gVHJ1ZSwgbm90X2ltcG9ydGVkX21zZyA9IHByZWRlZl9tc2csIGluY29ycmVjdF9hc19tc2cgPSBwcmVkZWZfbXNnKVxudGVzdF9mdW5jdGlvbihcInByb2R1Y3RzLnBpdm90X3RhYmxlXCIsIG5vdF9jYWxsZWRfbXNnID0gcGl2b3RfbXNnLCBpbmNvcnJlY3RfbXNnID0gcGl2b3RfbXNnKVxuc3VjY2Vzc19tc2coXCJXZWxsIGRvbmUhXCIpIn0=

Note the additional argument aggfunc that gets passed to the pivot_table method. This argument indicates that you use an aggregation function used to combine multiple values. In this example, you can clearly see that the mean function is used.

Using stack() and unstack() to Reshape Your Pandas DataFrame

You have already seen an example of stacking in the answer to question 5!

Good news, you already know why you would use this and what you need to do to do it.

To repeat, when you stack a DataFrame, you make it taller. You move the innermost column index to become the innermost row index. You return a DataFrame with an index with a new inner-most level of row labels.

Go back to the full walk-through of the answer to question 5 “Splitting Text Into Multiple Columns” if you’re unsure of the workings of `stack().

The inverse of stacking is called unstacking. Much like stack(), you use unstack() to move the innermost row index to become the innermost column index.

For a good explanation of pivoting, stacking and unstacking, go to this page.

Reshaping Your DataFrame With Melt()

Melting is considered to be very useful for when you have a data that has one or more columns that are identifier variables, while all other columns are considered measured variables.

These measured variables are all “unpivoted” to the row axis. That is, while the measured variables that were spread out over the width of the DataFrame, the melt will make sure that they will be placed in the height of it. Or, yet in other words, your DataFrame will now become longer instead of wider.

As a result, you just have two non-identifier columns, namely, ‘variable’ and ‘value’.

Let’s illustrate this with an example:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IHBhbmRhcyBhcyBwZCIsInNhbXBsZSI6IiMgVGhlIGBwZW9wbGVgIERhdGFGcmFtZVxucGVvcGxlID0gcGQuRGF0YUZyYW1lKHsnRmlyc3ROYW1lJyA6IFsnSm9obicsICdKYW5lJ10sXG4gICAgICAgICAgICAgICAgICAgICAgICdMYXN0TmFtZScgOiBbJ0RvZScsICdBdXN0ZW4nXSxcbiAgICAgICAgICAgICAgICAgICAgICAgJ0Jsb29kVHlwZScgOiBbJ0EtJywgJ0IrJ10sXG4gICAgICAgICAgICAgICAgICAgICAgICdXZWlnaHQnIDogWzkwLCA2NF19KVxuXG4jIFVzZSBgbWVsdCgpYCBvbiB0aGUgYHBlb3BsZWAgRGF0YUZyYW1lXG5wcmludChwZC5fX19fKHBlb3BsZSwgaWRfdmFycz1bJ0ZpcnN0TmFtZScsICdMYXN0TmFtZSddLCB2YXJfbmFtZT0nbWVhc3VyZW1lbnRzJykpIiwic29sdXRpb24iOiIjIFRoZSBgcGVvcGxlYCBEYXRhRnJhbWVcbnBlb3BsZSA9IHBkLkRhdGFGcmFtZSh7J0ZpcnN0TmFtZScgOiBbJ0pvaG4nLCAnSmFuZSddLFxuICAgICAgICAgICAgICAgICAgICAgICAnTGFzdE5hbWUnIDogWydEb2UnLCAnQXVzdGVuJ10sXG4gICAgICAgICAgICAgICAgICAgICAgICdCbG9vZFR5cGUnIDogWydBLScsICdCKyddLFxuICAgICAgICAgICAgICAgICAgICAgICAnV2VpZ2h0JyA6IFs5MCwgNjRdfSlcblxuIyBVc2UgYG1lbHQoKWAgb24gdGhlIGBwZW9wbGVgIERhdGFGcmFtZVxucHJpbnQocGQubWVsdChwZW9wbGUsIGlkX3ZhcnM9WydGaXJzdE5hbWUnLCAnTGFzdE5hbWUnXSwgdmFyX25hbWU9J21lYXN1cmVtZW50cycpKSIsInNjdCI6Im1lbHRfbXNnPVwiRGlkIHlvdSB1c2UgdGhlIGBtZWx0KClgIGZ1bmN0aW9uP1wiXG50ZXN0X2Z1bmN0aW9uKFwicGFuZGFzLm1lbHRcIiwgbm90X2NhbGxlZF9tc2cgPSBtZWx0X21zZywgaW5jb3JyZWN0X21zZyA9IG1lbHRfbXNnKVxuc3VjY2Vzc19tc2coXCJHb29kIGpvYiFcIikifQ==

If you’re looking for more ways to reshape your data, check out the documentation.

10. How To Iterate Over a Pandas DataFrame

You can iterate over the rows of your DataFrame with the help of a for loop in combination with an iterrows() call on your DataFrame:
eyJsYW5ndWFnZSI6InB5dGhvbiIsInByZV9leGVyY2lzZV9jb2RlIjoiaW1wb3J0IG51bXB5IGFzIG5wXG5pbXBvcnQgcGFuZGFzIGFzIHBkIiwic2FtcGxlIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBjb2x1bW5zPVsnQScsICdCJywgJ0MnXSlcblxuZm9yIGluZGV4LCByb3cgaW4gZGYuaXRlcnJvd3MoKSA6XG4gICAgcHJpbnQocm93WydBJ10sIHJvd1snQiddKSIsInNvbHV0aW9uIjoiZGYgPSBwZC5EYXRhRnJhbWUoZGF0YT1ucC5hcnJheShbWzEsIDIsIDNdLCBbNCwgNSwgNl0sIFs3LCA4LCA5XV0pLCBjb2x1bW5zPVsnQScsICdCJywgJ0MnXSlcblxuZm9yIGluZGV4LCByb3cgaW4gZGYuaXRlcnJvd3MoKSA6XG4gICAgcHJpbnQocm93WydBJ10sIHJvd1snQiddKSIsInNjdCI6InN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9

iterrows() allows you to efficiently loop over your DataFrame rows as (index, Series) pairs. In other words, it gives you (index, row) tuples as a result.

11. How To Write a Pandas DataFrame to a File

When you have done your data munging and manipulation with Pandas, you might want to export the DataFrame to another format. This section will cover two ways of outputting your DataFrame: to a CSV or to an Excel file.

Outputting a DataFrame to CSV

To output a Pandas DataFrame as a CSV file, you can use to_csv():

import pandas as pd
df.to_csv('myDataFrame.csv')

That piece of code seems quite simple, but this is just where the difficulties begin for most people because you will have specific requirements for the output of your data. Maybe you don’t want a comma as a delimiter, or you want to specify a specific encoding, …

No worries. You can pass some additional arguments to to_csv() to make sure that your data is outputted the way you want it to be!

  • To delimit by a tab, use the sep argument:

    import pandas as pd
    df.to_csv('myDataFrame.csv', sep='\t')
  • To use a specific character encoding, you can use the encoding argument:

    import pandas as pd
    df.to_csv('myDataFrame.csv', sep='\t', encoding='utf-8')
  • Furthermore, you can specify how you want your NaN or missing values to be represented, whether or not you want to output the header, whether or not you want to write out the row names, whether you want compression, … Read up on the options here.

Writing a DataFrame to Excel

Very similar to what you did to output your DataFrame to CSV, you can use to_excel() to write your table to Excel. However, it is a bit more complicated:

import pandas as pd
writer = pd.ExcelWriter('myDataFrame.xlsx')
df.to_excel(writer, 'DataFrame')
writer.save()

Note, however, that, just like with to_csv(), you have a lot of extra arguments such as startcol, startrow, and so on, to make sure output your data correctly. Go to this page to read up on them.

If, however, you want more information on IO tools in Pandas, you check out this page.

Python For Data Science Is More Than Pandas DataFrames

That’s it! You've successfully completed the Pandas DataFrame tutorial!

You’re on your way to becoming a master in Pandas DataFrames.

The answers to the 11 frequently asked Pandas questions represent important functions that you will need to import, clean and manipulate your data for your data science work. Are you not sure that you have gone deep enough into this matter? Our Importing Data In Python course will help you out! If you’ve got the hang out of this, you might want to see Pandas at work in a real-life project. The Importance of Preprocessing in Data Science and the Machine Learning Pipeline tutorial series is a must-read and the open course Introduction to Python & Machine Learning is a must-complete.

Comments

romeroreimer
I always find a very nice information in this place, Thanks!
08/04/17 11:55 PM |
romeroreimer
I always find a very nice information in this place, Thanks!
08/04/17 10:39 PM |
romeroreimer
I always find a very nice information in this place, Thanks!
08/04/17 11:57 AM |
nguye25v
Thanks for the helpful tutorial! Finally i understand `loc` and `iloc`. I'm just wondering what the advantage(s) of using pandas over R's dpyr are. I find the syntax and functions in dplyr so much more intuitive.
04/01/17 8:21 PM |
rizkyaenm
Thanks for the tutorial. it's really helpful for me as beginner to learn python and pandas :)
03/10/17 4:23 AM |