Skip to content

Data Manipulation with pandas

Run the hidden code cell below to import the data used in this course.


1 hidden cell

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

 # Add your code snippets here
import numpy as np
import pandas as pd
import seaborn as sns

highest_sales_department = walmart.groupby('department')['weekly_sales'].max().sort_values(ascending=False).head(5)
print(highest_sales_department)
total_2019_organic_avo = avocado[
    (avocado['year'] == 2017) &
    (avocado['type'] == 'organic')
]['nb_sold'].sum().round()

print(f"Total number of organic avocados sold in 2017: {total_2019_organic_avo}")
# Total homeless by region
homeless_by_region = (
    homelessness.groupby('region')['individuals']
    .sum()
    .sort_values(ascending=False)
)


# Vertical bar chart
homeless_by_region.plot(kind='bar', title='Total Homeless People by Region')
plt.ylabel('Total Homeless People')
plt.show()

# Horizontal bar chart (Bonus)
homeless_by_region.plot(kind='barh', title='Total Homeless People by Region')
plt.xlabel('Total Homeless People')
plt.show()
# Filter for the desired cities
filtered_temperatures = temperatures[temperatures['city'].isin(['Toronto', 'Rome'])]

# Create the line plot
plt.figure(figsize=(12, 6))
sns.lineplot(
    data=filtered_temperatures,
    x='date',
    y='avg_temp_c',
    hue='city',
    style='city',
    markers=True,
    dashes=False
)

# Add title and labels
plt.title('Average Temperatures in Toronto and Rome Over Time', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Average Temperature (°C)', fontsize=12)

# Add grid and save the figure
plt.grid(True)
plt.legend(title='Cities')
plt.savefig('temperatures_toronto_rome.png', dpi=300, bbox_inches='tight')
plt.show()

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Print the highest weekly sales for each department in the walmart DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this video.
  • What was the total nb_sold of organic avocados in 2017 in the avocado DataFrame? If you're stuck, try reviewing this video.
  • Create a bar plot of the total number of homeless people by region in the homelessness DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this video.
  • Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this video.