Skip to content
Data Manipulation with pandas
Data Manipulation with pandas 🐼
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Import the four datasets
avocado = pd.read_csv("datasets/avocado.csv")
homelessness = pd.read_csv("datasets/homelessness.csv")
temperatures = pd.read_csv("datasets/temperatures.csv")
walmart = pd.read_csv("datasets/walmart.csv")
Explore Datasets🎯
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Print the highest weekly sales for each
department
in thewalmart
DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
nb_sold
of organic avocados in 2017 in theavocado
DataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
homelessness
DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
Primer Problema 🤨
- Agrupamos
department
con respecto aweekly_sales
. - Sacamos las máximas ventas por semana.
- Por último los ordenamos de forma descendente.
walmart.head()
Current Type: Bar
Current X-axis: department
Current Y-axis: weekly_sales
Current Color: None
Department vs weekly_sales
walmart.groupby('department')['weekly_sales'].agg(max).sort_values(ascending=False).head(5)
Segundo Problema 🎇
- Se tomo la columna de
nd_sold
. - Luego se identifico en la columna
year
los mayores o iguales a 2017. - Al final sumamas los resultado de la columna
nd_sold
. - Guardamos el resultado en
total_avocado
avocado.head()
total_avodaco = avocado[['nb_sold']].loc[avocado['year']>=2017].sum()
total_avodaco
Tercer Problema 🥳
- Agrupamos
region
con respecto afamily_members
. - Obtenemos el total con
sum()
. - Ordenamos de forma descendiente.
- Creamos un gráfica de barras con
plt.barh()
homelessness.head()
total_n_homeless = homelessness.groupby('region')['family_members'].agg(sum).sort_values(ascending=False)
plt.barh(total_n_homeless.index, total_n_homeless.values, align='center')