Skip to content
Data Manipulation with pandas
Data Manipulation with pandas
Run the hidden code cell below to import the data used in this course.
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets hereExplore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Print the highest weekly sales for each departmentin thewalmartDataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
- What was the total nb_soldof organic avocados in 2017 in theavocadoDataFrame? If you're stuck, try reviewing this video.
- Create a bar plot of the total number of homeless people by region in the homelessnessDataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
- Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
walmart.head()#1
walmart1=walmart.groupby('department')['weekly_sales'].max()
walmart1.sort_values(ascending=False).head()
#or
#walmart1=walmart1.to_frame()
#walmart1.sort_values('weekly_sales',ascending=False).head()
avocado.head()#2
avocado[(avocado['type']=='organic') & (avocado['year']==2017)]['nb_sold'].sum()homelessness.head()#3
homelessness1=homelessness.groupby('state').agg({'state_pop':sum}).sort_values('state_pop',ascending=False)
homelessness1.reset_index(inplace=True)
homelessness1.head()
plt.barh(homelessness1['state'],homelessness1['state_pop'])
plt.tight_layout()
plt.show()temperatures.head()t_t=temperatures[temperatures['city']=='Toronto']
t=temperatures[temperatures['city']=='Rome']
t['date']#4
t_t=temperatures[temperatures['city']=='Toronto']
t=temperatures[temperatures['city']=='Rome']
plt.plot(t_t['date'],t_t['avg_temp_c'],label='Toronto')
plt.plot(t['date'],t['avg_temp_c'],label='Rome')
plt.legend()
plt.show()