Skip to content
Data Manipulation with pandas
Data Manipulation with pandas
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Print the highest weekly sales for each
department
in thewalmart
DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
nb_sold
of organic avocados in 2017 in theavocado
DataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
homelessness
DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
print(walmart.info())
walmart_wkly_sales = walmart.groupby(["department"])["weekly_sales"].max()
walmart_wkly_sales.sort_values(ascending=False).head(5)
avocado.head()
avocado_tmp1 = avocado.set_index("year")
avocado_tmp1.loc[2017]["nb_sold"].sum()
homelessness.head()
homelessness_region = homelessness[["region","individuals"]].sort_values("individuals",ascending=False)
homelessness_region.plot(kind="bar")
homelessness_region.plot(kind="barh")
temperatures.head()
temperatures_region = temperatures.set_index("city")["avg_temp_c"]
temperatures_region.loc["Toronto"].plot(kind="line")
temperatures_region.loc["Rome"].plot(kind="line")
plt.legend(["Toronto","Rome"])
plt.show()