Introduction to Data Visualization with Matplotlib
Run the hidden code cell below to import the data used in this course.
# Importing the course packages
import pandas as pd
import numpy as np
# Importing the course datasets
climate_change = pd.read_csv('datasets/climate_change.csv', parse_dates=["date"], index_col="date")
medals = pd.read_csv('datasets/medals_by_country_2016.csv', index_col=0)
summer_2016 = pd.read_csv('datasets/summer2016.csv')
austin_weather = pd.read_csv("datasets/austin_weather.csv", index_col="DATE")
weather = pd.read_csv("datasets/seattle_weather.csv", index_col="DATE")
# Some pre-processing on the weather datasets, including adding a month column
seattle_weather = weather[weather["STATION"] == "USW00094290"]
month = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
seattle_weather["MONTH"] = month
austin_weather["MONTH"] = monthVisualizing data in plots and figures exposes the underlying patterns in the data and provides insights. Good visualizations also help you communicate your data to others, and are useful to data analysts and other consumers of the data.
Matplotlib provides the building blocks to create rich visualizations of many different kinds of datasets. You will learn how to create visualizations for different kinds of data and how to customize, automate, and share these visualizations.
Introduction to Matplotlib
This chapter introduces the Matplotlib visualization library and demonstrates how to use it with data.
Importing the pyplot interface
import matplotlib.pyplot as pltThe fig, ax = plt.subplots() command, when called without any inputs, creates two different objects:
- a Figure object and
- an Axes object.
The Figure object is a container that holds everything that you see on the page.
Meanwhile, the Axes is the part of the page that holds the data. It is the canvas on which we will draw with our data, to visualize it.
Here, you can see a Figure with empty Axes. No data has been added yet.
fig, ax = plt.subplots()Here is some data. This is a DataFrame that contains information about the weather in the city of Seattle in the different months of the year.
The "MONTH" column contains the three-letter names of the months of the year.
seattle_weather["MONTH"]The "monthly average normal temperature" column contains the temperatures in these months, in Fahrenheit degrees, averaged over a ten-year period.
seattle_weather["MLY-TAVG-NORMAL"]To add the data to the Axes, we call a plotting command. The plotting commands are methods of the Axes object.
For example, here we call the method called plot with the month column as the first argument and the temperature column as the second argument. Finally, we call the plt.show() function to show the effect of the plotting command.
This adds a line to the plot. The horizontal dimension of the plot represents the months according to their order and the height of the line at each month represents the average temperature. The trends in the data are now much clearer than they were just by reading off the temperatures from the table.
fig, ax = plt.subplots()
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-TAVG-NORMAL"])
plt.show()If you want, you can add more data to the plot. For example, we also have a table that stores data about the average temperatures in the city of Austin, Texas. We add these data to the axes by calling the plot method again.
First, we create the Figure and the Axes objects. We call the Axes method plot to add first the Seattle temperatures, and then the Austin temperatures to the Axes. Finally, we ask Matplotlib to show us the figure.