Skip to content

Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.


1 hidden cell

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

.head()
.info()
.shape # This is an attribute, not a method.  Methods have paratheses.  
.describe()
.columns
.index # note, we did not use "rows"

.sort_values("column", ascending = False)  # To add multiple... ["column1", column2] .... ascending [True, False]

df["column"] > 50 # will produce a column of Trues and Falses
df[df["column"]] > 50 #subset the output if it matches the condition (get original values)
df[df["column"]] == "text_that_matches"
df[df["column"]] == "2015-09-30"  # Date goes in quotes

df[(df["column1"] == 'type2') & (df["column2"] == 'type5')]

string = ["apple", "banana", "orange"]
df[df["fruit"].isin(string)]


df = df.drop_duplicates(subset="item")


df["item"].value_counts()
df["item"].value_counts(sort=True)
df["item"].value_counts(normalize=True) # turns counts into proportion of the total


store_counts = store_types["type"].value_counts()
store_props = store_types["type"].value_counts(normalize=True)

df.groupby("item")["quantity"].mean()
df.groupby("item")["quantity"].agg([min,max,sum])

df.groupby(["item", "size"])["quantity"].agg([min,max,sum])

#Use numpy for summary statistics, ie: mean
#np.mean() not mean



# Import numpy with the alias np
import numpy as np

# For each store type, aggregate weekly_sales: get min, max, mean, and median
sales_stats = sales.groupby(["type", "weekly_sales"]).agg([min, max, np.mean, np.median])

# Print sales_stats
print(sales_stats)

# For each store type, aggregate unemployment and fuel_price_usd_per_l: get min, max, mean, and median
unemp_fuel_stats = sales.groupby(["type", "unemployment", "fuel_price_usd_per_l"]).agg([min, max, np.mean, np.median])

# Print unemp_fuel_stats
print(unemp_fuel_stats)

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.