Skip to content
Introduction to Data Visualization with Matplotlib
Introduction to Data Visualization with Matplotlib
Run the hidden code cell below to import the data used in this course.
# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Importing the course datasets 
climate_change = pd.read_csv('datasets/climate_change.csv', parse_dates=["date"], index_col="date")
medals = pd.read_csv('datasets/medals_by_country_2016.csv', index_col=0)
summer_2016 = pd.read_csv('datasets/summer2016.csv')
austin_weather = pd.read_csv("datasets/austin_weather.csv", index_col="DATE")
weather = pd.read_csv("datasets/seattle_weather.csv", index_col="DATE")
# Some pre-processing on the weather datasets, including adding a month column
seattle_weather = weather[weather["STATION"] == "USW00094290"] 
month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] 
seattle_weather["MONTH"] = month 
austin_weather["MONTH"] = monthTake Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets hereExplore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Using austin_weatherandseattle_weather, create a Figure with an array of two Axes objects that share a y-axis range (MONTHSin this case). Plot Seattle's and Austin'sMLY-TAVG-NORMAL(for average temperature) in the top Axes and plot theirMLY-PRCP-NORMAL(for average precipitation) in the bottom axes. The cities should have different colors and the line style should be different between precipitation and temperature. Make sure to label your viz!
- Using climate_change, create a twin Axes object with the shared x-axis as time. There should be two lines of different colors not sharing a y-axis:co2andrelative_temp. Only include dates from the 2000s and annotate the first date at whichco2exceeded 400.
- Create a scatter plot from medalscomparing the number of Gold medals vs the number of Silver medals with each point labeled with the country name.
- Explore if the distribution of Agevaries in different sports by creating histograms fromsummer_2016.
- Try out the different Matplotlib styles available and save your visualizations as a PNG file.
#1
fig,ax=plt.subplots(2,1,sharey=True)
ax[0].plot(austin_weather["MONTH"], austin_weather["MLY-TAVG-NORMAL"], color="blue", linestyle="-")
ax[0].plot(seattle_weather["MONTH"], seattle_weather["MLY-TAVG-NORMAL"], color="red", linestyle="-")
ax[0].set_title("Average Temperature")
ax[0].set_xlabel("Month")
ax[0].set_ylabel("Temperature (F)")
ax[1].plot(austin_weather["MONTH"], austin_weather["MLY-PRCP-NORMAL"], color="blue", linestyle=":")
ax[1].plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"], color="red", linestyle=":")
ax[1].set_title("Average Precipitation")
ax[1].set_xlabel("Month")
ax[1].set_ylabel("Precipitation (in)")
plt.xlabel("Month")
plt.ylabel("Temperature (F)")
plt.title("Weather Data for Austin and Seattle")
plt.show()
climate_change#2
# Create a figure with two axes that share a x-axis
fig, axes = plt.subplots(2,1, sharex=True)
# Plot co2 and relative_temp on the twin axes
axes[0].plot(climate_change.index, climate_change["co2"], label="CO2")
axes[1].plot(climate_change.index, climate_change["relative_temp"], label="Relative Temperature")
# Annotate the first date at which co2 exceeded 400
axes[0].annotate("CO2 exceeded 400 ppm", xy=(2014, 400), xytext=(2010, 420))
# Label the axes
axes[0].set_xlabel("Date")
axes[0].set_ylabel("CO2 (ppm)")
axes[1].set_xlabel("Date")
axes[1].set_ylabel("Relative Temperature (°C)")
# Add a legend
plt.legend()
# Show the figure
plt.show()medals#3
plt.scatter(medals["Gold"], medals["Silver"], c="blue", label="Country")
# Label each point with the country name
for i, country in enumerate(medals.index):
    plt.annotate(country, (medals["Gold"][i], medals["Silver"][i]), xytext=(5, 5), textcoords="offset points", fontsize=14)
# Label the axes
plt.xlabel("Gold Medals")
plt.ylabel("Silver Medals")
# Add a legend
plt.legend()
# Show the figure
plt.show()summer_2016#4
# Create a histogram of the age distribution for each sport
for sport in summer_2016['Sport']:
    plt.hist(summer_2016[summer_2016['Sport']==sport]["Age"], bins=50)
    plt.title(sport)
    plt.xlabel("Age")
    plt.ylabel("Frequency")
plt.show()