Skip to content
Data Manipulation with pandas
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
Add your notes here
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Print the highest weekly sales for each
department
in thewalmart
DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
nb_sold
of organic avocados in 2017 in theavocado
DataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
homelessness
DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
print(walmart.head())
walmart.info()
walmart.groupby('department')['weekly_sales'].max().head(5)
avocado.head()
total_nb_sold=avocado['nb_sold'].sum()
print(total_nb_sold)
homelessness.head()
import matplotlib.pyplot as plt
# Create the bar plot with bars ordered in descending order
sns.barplot(data=homelessness.sort_values('individuals', ascending=False), x='region', y='individuals')
# Change the orientation of the x-axis labels
plt.xticks(rotation=45)
sns.barplot(data=homelessness.sort_values('individuals', ascending=False), y='region', x='individuals')
# Change the orientation of the y-axis labels
plt.yticks(rotation=0)
temperatures.head()
# Filter the temperatures dataframe for Toronto and Rome
toronto_temps = temperatures[temperatures['city'] == 'Toronto']
rome_temps = temperatures[temperatures['city'] == 'Rome']
# Create the line plot
plt.figure(figsize=(12, 6))
plt.plot(toronto_temps['date'], toronto_temps['avg_temp_c'], label='Toronto')
plt.plot(rome_temps['date'], rome_temps['avg_temp_c'], label='Rome')
# Label the plot
plt.title('Average Temperature in Toronto and Rome Over Time')
plt.xlabel('Date')
plt.ylabel('Average Temperature (°C)')
# Add a legend
plt.legend()
# Display the plot
plt.show()