Data Manipulation with pandas
Run the hidden code cell below to import the data used in this course.
1 hidden cell
Take Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Print the highest weekly sales for each
department
in thewalmart
DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video. - What was the total
nb_sold
of organic avocados in 2017 in theavocado
DataFrame? If you're stuck, try reviewing this video. - Create a bar plot of the total number of homeless people by region in the
homelessness
DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video. - Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.
Calculating on a pivot table Pivot tables are filled with summary statistics, but they are only a first step to finding something insightful. Often you'll need to perform further calculations on them. A common thing to do is to find the rows or columns where the highest or lowest value occurs.
Recall from Chapter 1 that you can easily subset a Series or DataFrame to find rows of interest using a logical condition inside of square brackets. For example: series[series > value]
.
pandas is loaded as pd and the DataFrame temp_by_country_city_vs_year
is available.
Calculate the mean temperature for each year, assigning to mean_temp_by_year.
Filter mean_temp_by_year
for the year that had the highest mean temperature.
Calculate the mean temperature for each city (across columns), assigning to mean_temp_by_city.
Filter mean_temp_by_city
for the city that had the lowest mean temperature.
Answer:
# Get the worldwide mean temp by year
mean_temp_by_year = temp_by_country_city_vs_year.mean()
# Filter for the year that had the highest mean temp
print(mean_temp_by_year[mean_temp_by_year == mean_temp_by_year.max()])
# Get the mean temp by city
mean_temp_by_city = temp_by_country_city_vs_year.mean(axis ="columns")
# Filter for the city that had the lowest mean temp
print(mean_temp_by_city[mean_temp_by_city == mean_temp_by_city.min()])