Data Manipulation with pandas

DATA MANIPULATION -WALMART,AVOCADO,TEMPERATURE,HOMELESSNESS

1 hidden cell

Run cancelled

# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")

3 hidden cells

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

highest weekly sales for top 5 departments

we subset weekly sales and group them by departments ,pick the max in each then limit to only 5 top ones here.

# Import packages
import pandas as pd
import numpy as np

# Load walmart dataset
walmart = pd.read_csv("datasets/walmart.csv")
walmart

highest_sales_dept = walmart.groupby('department')['weekly_sales'].max().nlargest(5)
highest_sales_dept

avocado number sold-organic only

#import avocado
avocado = pd.read_csv("datasets/avocado.csv")
avocado

total_sold_in_2017 = avocado[avocado['type'] == 'organic'].loc[avocado['year'] == 2017, 'nb_sold'].sum()
print(total_sold_in_2017)

we'll try and get homelessness by region and visualize via a bar plot.

#homelesness dataset
homelessness = pd.read_csv("datasets/homelessness.csv")
homelessness

by_state=homelessness.groupby('state')['individuals'].sum()
by_state_df=pd.DataFrame(by_state)
by_state_df.reset_index(inplace=True)
by_state_df.rename(columns={'state': 'states'}, inplace=True)
by_state_df

‌
‌
‌

Data Manipulation with pandas

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Explore Datasets

Explore Datasets