Skip to content

Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.


1 hidden cell

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

How to manipulate the data of a dataframe's column (or more) using a user-defined function

import pandas as pd
import numpy as np
def pct40(column):
    return column.quantile(0.4)
dictionary = {"names": ["Jenny", "Max", "Tobby", "Tom", "Alan"],"heights": [1.8, 1.7, 1.5, 1.93, 2], "weights": [70, 65, 80, 120, 90] 
    
}
df = pd.DataFrame(dictionary)
df_quad40 = df[["heights", "weights"]].agg([pct40, np.mean])
print(df_quad40)

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.