Introduction to Data Visualization with Matplotlib
Run the hidden code cell below to import the data used in this course.
1 hidden cell
Chapter 1 - Introduction
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(x data, y data)
plt.show()
ax.plot(marker="o") gives circle markers on line. "v" gives triangles etcc
ax.plot(linestyle="--") gives a dashed line "None" removes the line
ax.plot(color="r") makes colour red
ax.set_xlabel("label") - Set x label
ax.set_titel("title")
fig, ax = plt.subplots(3,2) -- Creates smaller multiples, 3 rows 2 columns.
ax[0, 0].plot() Same as above but with index specifying which small multiple
plt.subplot(sharey=True) - Same y axis scale
Time Series Data
For pandas to recognise dates we need to parse them as datetime.
Can also define date column as an index.
Plot an index: ax.plot(df.index)
df["start_date":"end_date"]
Plotting two time series plots together
You can use twin axes add:
ax2 = ax.twinx()
ax2.plot(df.index, df["column"])
ax2.set_ylabel('label', colour='red') -- Good practice to change colours
ax2.tick_params('y', colors = 'red')
plt.show()
Annotate time-series data
ax2.annotate("text", xy="x coordinate", "y coord"), xytext("x coord", "y coord"), arrowprops={"arrowstyle":"->", "color":"gray"})
Chapter 3 - Quantitative comparisons and statistical visualisations
ax.bar(x,y)
ax.set_xticklabels(x, rotation=90)
Stacked bar:
Histograms
ax.hist(column, label="label", bins=100, histtype="step")
Statistical Plotting
Adding error bars to bar charts
barchart argument:
ax.bar(yerr=column.std())
line plot method:
ax.errorbar(x,y, yerr=column)
Box Plots:
ax.boxplot([column1, column2])
ax.set_xticklabels([label 1], [label 2])
ax.set_ylabel(label 3)
Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Using
austin_weatherandseattle_weather, create a Figure with an array of two Axes objects that share a y-axis range (MONTHSin this case). Plot Seattle's and Austin'sMLY-TAVG-NORMAL(for average temperature) in the top Axes and plot theirMLY-PRCP-NORMAL(for average precipitation) in the bottom axes. The cities should have different colors and the line style should be different between precipitation and temperature. Make sure to label your viz! - Using
climate_change, create a twin Axes object with the shared x-axis as time. There should be two lines of different colors not sharing a y-axis:co2andrelative_temp. Only include dates from the 2000s and annotate the first date at whichco2exceeded 400. - Create a scatter plot from
medalscomparing the number of Gold medals vs the number of Silver medals with each point labeled with the country name. - Explore if the distribution of
Agevaries in different sports by creating histograms fromsummer_2016. - Try out the different Matplotlib styles available and save your visualizations as a PNG file.