Skip to content

DATA MANIPULATION -WALMART,AVOCADO,TEMPERATURE,HOMELESSNESS


1 hidden cell
Run cancelled
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")

3 hidden cells

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.

highest weekly sales for top 5 departments

we subset weekly sales and group them by departments ,pick the max in each then limit to only 5 top ones here.

# Import packages
import pandas as pd
import numpy as np

# Load walmart dataset
walmart = pd.read_csv("datasets/walmart.csv")
walmart
highest_sales_dept = walmart.groupby('department')['weekly_sales'].max().nlargest(5)
highest_sales_dept

avocado number sold-organic only

#import avocado
avocado = pd.read_csv("datasets/avocado.csv")
avocado
total_sold_in_2017 = avocado[avocado['type'] == 'organic'].loc[avocado['year'] == 2017, 'nb_sold'].sum()
print(total_sold_in_2017)

we'll try and get homelessness by region and visualize via a bar plot.

#homelesness dataset
homelessness = pd.read_csv("datasets/homelessness.csv")
homelessness
by_state=homelessness.groupby('state')['individuals'].sum()
by_state_df=pd.DataFrame(by_state)
by_state_df.reset_index(inplace=True)
by_state_df.rename(columns={'state': 'states'}, inplace=True)
by_state_df