Skip to content
Data Manipulation with pandas
  • AI Chat
  • Code
  • Report
  • Data Manipulation with pandas

    Run the hidden code cell below to import the data used in this course.

    # Import the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    
    # Import the four datasets
    avocado = pd.read_csv("datasets/avocado.csv")
    homelessness = pd.read_csv("datasets/homelessness.csv")
    temperatures = pd.read_csv("datasets/temperatures.csv")
    walmart = pd.read_csv("datasets/walmart.csv")

    Add your notes here

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
    • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
    • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
    • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
    print(walmart.head())
    walmart.info()
    walmart.groupby('department')['weekly_sales'].max().head(5)
    avocado.head()
    total_nb_sold=avocado['nb_sold'].sum()
    print(total_nb_sold)
    homelessness.head()
    import matplotlib.pyplot as plt
    
    # Create the bar plot with bars ordered in descending order
    sns.barplot(data=homelessness.sort_values('individuals', ascending=False), x='region', y='individuals')
    
    # Change the orientation of the x-axis labels
    plt.xticks(rotation=45)
    
    sns.barplot(data=homelessness.sort_values('individuals', ascending=False), y='region', x='individuals')
    
    # Change the orientation of the y-axis labels
    plt.yticks(rotation=0)
    temperatures.head()
    # Filter the temperatures dataframe for Toronto and Rome
    toronto_temps = temperatures[temperatures['city'] == 'Toronto']
    rome_temps = temperatures[temperatures['city'] == 'Rome']
    
    # Create the line plot
    plt.figure(figsize=(12, 6))
    plt.plot(toronto_temps['date'], toronto_temps['avg_temp_c'], label='Toronto')
    plt.plot(rome_temps['date'], rome_temps['avg_temp_c'], label='Rome')
    
    # Label the plot
    plt.title('Average Temperature in Toronto and Rome Over Time')
    plt.xlabel('Date')
    plt.ylabel('Average Temperature (°C)')
    
    # Add a legend
    plt.legend()
    
    # Display the plot
    plt.show()