Skip to content

Introduction to Data Visualization with Matplotlib

Run the hidden code cell below to import the data used in this course.


1 hidden cell

Chapter 1 - Introduction

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(x data, y data)
plt.show()

ax.plot(marker="o") gives circle markers on line. "v" gives triangles etcc ax.plot(linestyle="--") gives a dashed line "None" removes the line
ax.plot(color="r") makes colour red
ax.set_xlabel("label") - Set x label ax.set_titel("title")

fig, ax = plt.subplots(3,2) -- Creates smaller multiples, 3 rows 2 columns.
ax[0, 0].plot() Same as above but with index specifying which small multiple plt.subplot(sharey=True) - Same y axis scale

Time Series Data
For pandas to recognise dates we need to parse them as datetime.
Can also define date column as an index.
Plot an index: ax.plot(df.index)
df["start_date":"end_date"]

Plotting two time series plots together

You can use twin axes add:
ax2 = ax.twinx()
ax2.plot(df.index, df["column"])
ax2.set_ylabel('label', colour='red') -- Good practice to change colours
ax2.tick_params('y', colors = 'red')
plt.show()

Annotate time-series data

ax2.annotate("text", xy="x coordinate", "y coord"), xytext("x coord", "y coord"), arrowprops={"arrowstyle":"->", "color":"gray"})

Chapter 3 - Quantitative comparisons and statistical visualisations
ax.bar(x,y) ax.set_xticklabels(x, rotation=90)

Stacked bar:

Histograms

ax.hist(column, label="label", bins=100, histtype="step")

Statistical Plotting

Adding error bars to bar charts

barchart argument:

ax.bar(yerr=column.std())

line plot method:

ax.errorbar(x,y, yerr=column)

Box Plots:

ax.boxplot([column1, column2])
ax.set_xticklabels([label 1], [label 2])
ax.set_ylabel(label 3)

Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

  • Using austin_weather and seattle_weather, create a Figure with an array of two Axes objects that share a y-axis range (MONTHS in this case). Plot Seattle's and Austin's MLY-TAVG-NORMAL (for average temperature) in the top Axes and plot their MLY-PRCP-NORMAL (for average precipitation) in the bottom axes. The cities should have different colors and the line style should be different between precipitation and temperature. Make sure to label your viz!
  • Using climate_change, create a twin Axes object with the shared x-axis as time. There should be two lines of different colors not sharing a y-axis: co2 and relative_temp. Only include dates from the 2000s and annotate the first date at which co2 exceeded 400.
  • Create a scatter plot from medals comparing the number of Gold medals vs the number of Silver medals with each point labeled with the country name.
  • Explore if the distribution of Age varies in different sports by creating histograms from summer_2016.
  • Try out the different Matplotlib styles available and save your visualizations as a PNG file.