Skip to content
CheatSheet
  • AI Chat
  • Code
  • Report
  • A DataFrame is similar to a two-dimensional NumPy array, but it comes with column and row labels and each column can hold different data types. By extracting a single column or row from a DataFrame, you get a one-dimensional Series. Again, a Series is similar to a one-dimensional NumPy array with labels.

    pd.read_excel("xl/course_participants.xlsx")

    #pd.read_excel("xl/course_participants.xlsx")
    #one way of creating a DataFrame is to provide the data as a nested
    #list, along with values for columns and index:
    import pandas as pd
    
    data=[["Mark", 55, "Italy", 4.5, "Europe"],
    ["John", 33, "USA", 6.7, "America"],
    ["Tim", 41, "USA", 3.9, "America"],
    ["Jenny", 12, "Germany", 9.0, "Europe"]]
    df = pd.DataFrame(data=data,
    columns=["name", "age", "country",
    "score", "continent"],
    index=[1001, 1000, 1002, 1003])
    
    df
    df
    dfy calling the info method, you will get some basic information, most importantly
    #the number of data points and the data types for each column:
    df.info()
    #<class 'pandas.core.frame.DataFrame'>
    #Int64Index: 4 entries, 1001 to 1003
    #Data columns (total 5 columns):
    # Column Non-Null Count Dtype
    ##--- ------ -------------- -----
    #0 name 4 non-null object
    #1 age 4 non-null int64
    #2 country 4 non-null object
    #3 score 4 non-null float64
    #4 continent 4 non-null object
    #dtypes: float64(1), int64(1), object(3)
    #memory usage: 192.0+ bytes
    #If you are just interested in the data type of your columns, run df.dtypes instead.
    #Columns with strings or mixed data types will have the data type object. 1 Let us now
    #have a closer look at the index and columns of a DataFrame.
    
    df.dtypes

    Index The row labels of a DataFrame are called index. If you don’t have a meaningful index, leave it away when constructing a DataFrame. pandas will then automatically create an integer index starting at zero.An index will allow pandas to look up data faster and is essential for many common operations, e.g., combining two DataFrames. You access the index object like the following:

    df.index
    #If it makes sense, give the index a name. Let’s follow the table in Excel, and give it the
    #name user_id:
    df.index.name = "user_id"
    df.head()

    Unlike the primary key of a database, a DataFrame index can have duplicates, but looking up values may be slower in that case. To turn an index into a regular column use reset_index, and to set a new index use set_index. If you don’t want to lose your existing index when setting a new one, make sure to reset it first:

    # "reset_index" turns the index into a column, replacing the
    # index with the default index. This corresponds to the DataFrame
    # from the beginning that we loaded from Excel.
    df.reset_index()
    df

    By doing df.reset_index().set_index("name"), you are using method chaining: since reset_index() returns a DataFrame, you can directly call another DataFrame method without having to write out the intermediate result first.

    DataFrame Methods Return Copies Whenever you call a method on a DataFrame in the form df.method_name(), you will get back a copy of the DataFrame with that method applied, leaving the original DataFrame untouched. We have just done that by calling df.reset_index(). If you wanted to change the original DataFrame, you would have to assign the return value back to the original variable like the following: df = df.reset_index() Since we are not doing this, it means that our variable df is still holding its original data. The next samples also call DataFrame methods, i.e., don’t change the original DataFrame.

    #To change the index, use the reindex method:
    df.reindex([999, 1000, 1001, 1004])