Skip to content

Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.


1 hidden cell

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here
import pandas as pd
walmart = pd.read_csv('datasets/walmart.csv', index_col= 0)
print(walmart.head())
print(walmart.describe())
print(walmart.shape)
print(walmart.info())
print(walmart.index)
print(walmart.columns)

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
import pandas as pd
homelessness = pd.read_csv('datasets/homelessness.csv', index_col=0)
#data exploration
print("exploración de data")
print(homelessness.head())
print(homelessness.info())
print(homelessness.columns)
print(homelessness.index)


#asignation
print("""
-----------------------------
Ordenar por una sola variable
-----------------------------
""")
homelessness_ind = homelessness.sort_values('individuals')

print(homelessness_ind)
print("""
--------------------------------------------
Ordenar por dos variables cambiando el orden
--------------------------------------------
""")

homelessness_reg_fam = homelessness.sort_values(['region','family_members'],ascending=[True,False]) 
print(homelessness_reg_fam)

print("""
--------------------------------------------
subset 1 columna
--------------------------------------------
""")
individuals = homelessness['individuals']
print(individuals)
print("""
----------------------
""")


state = homelessness['state']
print(state)

print("""
--------------------------------------------
subset 2 columnas
--------------------------------------------
""")
ind_state = homelessness[['individuals','state']]
print(ind_state)

print("""
--------------------------------------------
filtrado por valores..
--------------------------------------------
""")
ind_state = homelessness[['individuals','state']]
print(ind_state)


import pandas as pd
homelessness = pd.read_csv('datasets/homelessness.csv')

#asignation
print("""
-----------------------------
Filtrar por cantidad
-----------------------------
""")

ind_gt_10k = homelessness['individuals'] > 10000
print(ind_gt_10k)

print("""
-----------------------------
Filtrar por nombre de campo
-----------------------------
""")


alabama_reg = homelessness[homelessness["state"]=="Alabama"]    
print(alabama_reg)
print(homelessness.info())

print("""
-----------------------------
Filtrar por nombre y cantidad
-----------------------------
""")


fam_lt_1k_pac = homelessness[(homelessness['family_members'] < 10000) &  (homelessness['region'] == 'Pacific')]
print(fam_lt_1k_pac)

print("""
-----------------------------
Filtrar por dos variables del mismo campo isin
-----------------------------
""")

south_mid_atlantic_isin = homelessness.isin([['South Atlantic','Mid-Atlantic']])
print(south_mid_atlantic_isin)

print("""
-----------------------------
Filtrar por dos variables del mismo campo método |
-----------------------------
""")

south_mid_atlantic = homelessness[(homelessness['region']=='South Atlantic')|(homelessness['region']=='Mid-Atlantic')]
print(south_mid_atlantic)

print("""
-----------------------------
Filtrar por lista
-----------------------------
""")

canu = ["California", "Arizona", "Nevada", "Utah"]

mojave_homelessness_isin2 = homelessness[(homelessness['state'].isin(canu))]

print(mojave_homelessness_isin2)