Skip to content
Data Manipulation with pandas
  • AI Chat
  • Code
  • Report
  • Data Manipulation with pandas

    Run the hidden code cell below to import the data used in this course.

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Import the four datasets
    avocado = pd.read_csv("datasets/avocado.csv")
    homelessness = pd.read_csv("datasets/homelessness.csv")
    temperatures = pd.read_csv("datasets/temperatures.csv")
    walmart = pd.read_csv("datasets/walmart.csv")

    Take Notes

    Add notes about the concepts you've learned and code cells with code you want to keep.

    #DATA MANIPULATION WITH PANDAS
    "Introduction to pandas" 
    #Exploring a DataFrame 
    
    #Chapter 1: DataFrames 
    "Sorting and subsetting" 
    "Creating new columns"
    
    #Chapter 2: Aggregating Data 
    'Summary statistics'
    'Counting'
    'Grouped summary statistics'
    
    #Chapter 3: Slicing and Indexing Data
    'Subsetting using slicing'
    'Indexes and subsetting using indexes'
    
    #Chapter 4: Creating and Visualizing Data
    'Plotting'
    'Handling missing data'
    'Reading data into a DataFrame'
    
    #Exploring a DataFrame:
    
    dogs.head()
    # the first few rows (the “head” of the DataFrame)
    dogs.info()
    # shows information on each of the columns, such as the data type and number of missing values
    dogs.shape
    # returns the number of rows and columns of the DataFrame
    dogs.describe()
    # calculates a few summary statistics for each column
    dogs.values
    # A two-dimensional NumPy array of values.
    dogs.columns
    # An index of columns: the column names.
    dogs.index
    # An index for the rows: either row numbers or row names.
    
    #Chapter 1: DataFrames
    'Sorting and subsetting'
    
    dogs.sort_values("weight_kg")
    #Sort the lightest dog at the top
    dogs.sort_values("weight_kg", ascending=False)
    #Sort the heaviest dog at the top
    dogs.sort_values(["weight_kg","height_cm"])
    #Sort the lightest dog at the top Then the shortest dog 
    dogs.sort_values(["weight_kg","height_cm"],ascending=[True,False])
    #Sort the lightest dog at the top Then the tallest dog 
    
    "Subsetting Coulmns"
    
    
    dogs["name"]
    #Subsetting DataFrame[“Coulmn_name”]
    dogs[["name","weight_kg"]]
    #Subsetting multiple columns
    
    "Subsetting Rows"
    
    
    Dogs["height_cm"] > 50
    #Get True or false values
    
    dogs[dogs["height_cm"] > 50]
    #Get The rows of dogs taller than 50
    
    dogs[dogs["breed"] == "labrador"]
    #Subsetting based on text data
    
    dogs[dogs["date_of_birth"] < "2015-01-01"]
    #Subsetting based on dates
    
    is_lab = dogs["breed"] == "Labrador"
    is_brown= dogs["color"] == "Brown"
    dogs[is_lab & is_brown]
    #Subsetting based on multiple conditions
    
    Is_balck_or_brown = dogs["color"].isin(["Black","Brown"])
    dogs[is_balck_or_brown]
    #Subsetting based on multiple conditions using .isin()
    
    
    "Creating a new column"
    
    dogs["height_m"]=dogs["height_cm"]/100
    #Adding a new column
    Hidden output

    Add your notes here

    # Add your code snippets here
    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Import the four datasets
    avocado = pd.read_csv("datasets/avocado.csv")
    homelessness = pd.read_csv("datasets/homelessness.csv")
    temperatures = pd.read_csv("datasets/temperatures.csv")
    walmart = pd.read_csv("datasets/walmart.csv")
    
    # Add total col as sum of individuals and family_members
    homelessness["total"] = homelessness["individuals"]+homelessness["family_members"]
    
    # Add p_individuals col as proportion of total that are individuals
    homelessness["p_individuals"]=homelessness["individuals"]/homelessness["total"]
    
    # See the result
    print(homelessness)

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
    • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
    • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
    • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.