Skip to content
1 hidden cell
Data Manipulation with pandas
Data Manipulation with pandas
Run the hidden code cell below to import the data used in this course.
1 hidden cell
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
.head()
.info()
.shape # This is an attribute, not a method. Methods have paratheses.
.describe()
.columns
.index # note, we did not use "rows"
.sort_values("column", ascending = False) # To add multiple... ["column1", column2] .... ascending [True, False]
df["column"] > 50 # will produce a column of Trues and Falses
df[df["column"]] > 50 #subset the output if it matches the condition (get original values)
df[df["column"]] == "text_that_matches"
df[df["column"]] == "2015-09-30" # Date goes in quotes
df[(df["column1"] == 'type2') & (df["column2"] == 'type5')]
string = ["apple", "banana", "orange"]
df[df["fruit"].isin(string)]
df = df.drop_duplicates(subset="item")
df["item"].value_counts()
df["item"].value_counts(sort=True)
df["item"].value_counts(normalize=True) # turns counts into proportion of the total
store_counts = store_types["type"].value_counts()
store_props = store_types["type"].value_counts(normalize=True)
df.groupby("item")["quantity"].mean()
df.groupby("item")["quantity"].agg([min,max,sum])
df.groupby(["item", "size"])["quantity"].agg([min,max,sum])
#Use numpy for summary statistics, ie: mean
#np.mean() not mean
# Import numpy with the alias np
import numpy as np
# For each store type, aggregate weekly_sales: get min, max, mean, and median
sales_stats = sales.groupby(["type", "weekly_sales"]).agg([min, max, np.mean, np.median])
# Print sales_stats
print(sales_stats)
# For each store type, aggregate unemployment and fuel_price_usd_per_l: get min, max, mean, and median
unemp_fuel_stats = sales.groupby(["type", "unemployment", "fuel_price_usd_per_l"]).agg([min, max, np.mean, np.median])
# Print unemp_fuel_stats
print(unemp_fuel_stats)Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Print the highest weekly sales for each
departmentin thewalmartDataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
nb_soldof organic avocados in 2017 in theavocadoDataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
homelessnessDataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.