Line Plots in MatplotLib with Python
Data visualization and storytelling are vital for data scientists as they transform complex data insights into compelling, easily digestible narratives for effective communication. While newer and fancier libraries are released, Matplotlib remains one of the most popular plotting libraries and builds the foundation for the newer ones.
This tutorial focuses on one of the most common types of Matplotlib plots, the line plot. Line plots are excellent at showcasing trends and fluctuations in data over time, connecting the dots (literally) to paint a vivid picture of what’s happening.
This tutorial starts with the basics of creating a simple line plot and then moves on to more advanced techniques, such as adding statistical information to plots. By the end of this tutorial, you will have a solid understanding of how to create different types of line plots in Matplotlib and how to use them to communicate your data insights to others effectively.
Are you ready to enhance your data visualization skills? Let’s begin!
The Libraries, Data, and Pre-Processing
Before we start creating line plots with Matplotlib, we must set up our environment. This involves installing Matplotlib, importing the required libraries, and pre-processing the dataset that we will use for our examples.
Installing matplotlib
To install Matplotlib, you can use pip, the package installer for Python. Simply open a terminal or command prompt and type:
pip install matplotlib
This will install the latest version of Matplotlib on your machine.
Importing the required libraries
Once Matplotlib is installed, we must import it with other required libraries such as NumPy and Pandas. NumPy is a library for working with arrays, while Pandas is for data manipulation and analysis.
To import these libraries, simply type the following code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Reading and pre-processing the data
For this tutorial, we will be using a dataset containing the daily prices of the DJIA index. The dataset includes five columns:
- Date column provides the date on which the remaining stock price information were recorded
- Open, Close: The price of DJIA at the opening and closing of the stock market for that particular day
- High, Low: The highest and lowest price the DJIA reached during the particular day
After loading the dataset, we’d do some basic data pre-processing such as renaming the column, converting it to datetime
variable, and sorting the data in ascending order of date.
Here’s the code for the above:
# Load the dataset into a Pandas DataFrame
df = pd.read_csv("HistoricalPrices.csv")
# Rename the column to remove an additional space
df = df.rename(columns = {' Open': 'Open', ' High': 'High', ' Low': 'Low', ' Close': 'Close'})
# Convert the date column to datetime
df['Date'] = pd.to_datetime(df['Date'])
# Sort the dataset in the ascending order of date
df = df.sort_values(by = 'Date')
Now that we have set up the environment and loaded the dataset, we can move on to creating line plots using Matplotlib.
Become a ML Scientist
Creating a Basic Line Plot in Matplotlib
We will start by creating a basic line plot and then customize the line plot to make it look more presentable and informative.
Using plt.plot() to create a line plot
To create a line plot, we will use the plt.plot()
function. This function takes two parameters; the x-axis values and y-axis values. In our case, the date column will be our x-axis values, while the close column will be our y-axis values. Here is the code:
# Extract the date and close price columns
dates = df['Date']
closing_price = df['Close']
# Create a line plot
plt.plot(dates, closing_price)
# Show the plot
plt.show()
When you run the above code, you should see a basic line plot of the DJIA stock.
Customizing the Line Plot
Matplotlib presents us with plenty of further customizations, which we can utilize per our needs.
Setting the line color
By default, the plt.plot()
function plots a blue line. However, you can change the line color by passing a color
parameter to the function. The color
parameter can take a string representing the color name or a hexadecimal code.
Here is an example:
# Plot in Red colour
plt.plot(dates, closing_price, color='red')
# Show the plot
plt.show()
This code will plot a red line instead of a blue one as shown below:
Basic line plot in red
Setting the line width
You can also change the line width by passing a linewidth
parameter to the plt.plot()
function. The linewidth
parameter takes a floating-point value representing the line's width.
Here is an example:
# Increasing the linewidth
plt.plot(dates, closing_price, linewidth=3)
# Show the plot
plt.show()
This code will plot a line with a width of 3 instead of the default width as shown below:
Thicker lines in the plot due to higher linewidth
Setting the line style
You can change the line style by passing a linestyle
parameter to the plt.plot()
function. The linestyle
parameter takes a string that represents the line style. The matplotlib documentation provides an extensive list of styles available.
Here’s how these can be used in code:
# Individually plot lines in solid, dotted, dashed and dashdot
plt.plot(dates, closing_price, linestyle='solid') # Default line style
plt.plot(dates, closing_price, linestyle='dotted')
plt.plot(dates, closing_price, linestyle='dashed')
plt.plot(dates, closing_price, linestyle='dashdot')
# Show the plot
plt.show()
Adding markers to line plots
Markers can be used to highlight specific points in the line plot. Various kinds of symbols can be used as markers and can be referenced from the matplotlib documentation.
Here is an example of using markers in a line plot:
# Add a cross marker for each point
plt.plot(df['Date'], df['Close'], marker='x')
# Show the plot
plt.show()
In the above code, we are using cross (x) markers to highlight the Close
prices of the DJIA stock as shown below:
Adding labels and title
To make the plot more informative, we can add axis labels and a title. We can achieve this by using the plt.xlabel()
, plt.ylabel()
, and plt.title()
functions, respectively.
Here is an example:
plt.plot(dates, closing_price, color='red', linewidth=2)
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.title('DJIA Stock Price')
# Show the plot
plt.show()
This code will plot a red line with a width of 2, with the x-axis labeled ‘Date,’ the y-axis labeled ‘Closing Price,’ and the title ‘DJIA Stock Price.’
Adding grid lines
We can also add grid lines to our plot to make it more readable. We can achieve this by using the plt.grid()
function. The plt.grid()
function takes a boolean value representing whether the grid should be shown.
Here is an example:
plt.plot(dates, closing_price, color='red', linewidth=2)
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.title('DJIA Stock Price')
# Add the grid
plt.grid(True)
# Show the plot
plt.show()
You’d see added grids to the plot:
Matplotlib Line Plots with Multiple Lines
In some cases, you may want to plot multiple lines on the same graph. To do this, you can call the plt.plot()
function multiple times with different data for each call. Here is an example:
# Line plot of Open and Close prices
plt.plot(df['Date'], df['Open'])
plt.plot(df['Date'], df['Close'])
plt.title('DJIA Open and Close Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()
In the above code, we are plotting both the Open
and Close
prices of the DJIA stock on the same graph.
Matplotlib Line Plots with Twin Axes
There might be cases where you want to represent two variables with different scales on the same plot. In such situations, using twin axes is an effective way to visualize the relationship between the variables without losing the clarity of the individual scales.
To create a line plot with twin axes, we need to use the twinx()
function. This function creates a new y-axis that shares the same x-axis as the original plot.
Here's an example:
# Create a new variable for demonstration purposes
df['High_minus_Low'] = df['High'] - df['Low']
# Create a basic line plot for the Close prices
fig, ax1 = plt.subplots()
ax1.plot(df['Date'], df['Close'], color='blue', label='Close Price')
ax1.set_xlabel('Date')
ax1.set_ylabel('Closing Price', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')
# Create a twin axis for the High_minus_Low variable
ax2 = ax1.twinx()
ax2.plot(df['Date'], df['High_minus_Low'], color='green', label='High - Low')
ax2.set_ylabel('High minus Low', color='green')
ax2.tick_params(axis='y', labelcolor='green')
# Add a title and show the plot
plt.title('DJIA Stock Price and High-Low Difference with Twin Axes')
plt.show()
And the resulting plot with twin axes:
Adding Statistical Information to Matplotlib Line Plots
In addition to visualizing trends and patterns in data, line plots can also display statistical information such as regression lines and error bars.
Adding a Matplotlib Regression Line
A regression line is a line that best fits the data points in a plot and can be used to model and predict future values. We can add a regression line to our line plot by using the polyfit()
function from the NumPy library, which fits a polynomial regression line to our data points.
import matplotlib.dates as mdates
# Convert Date column to numeric value
df['Date'] = mdates.date2num(df['Date'])
# Add regression line to plot
coefficients_open = np.polyfit(df['Date'], df['Open'], 1)
p_open = np.poly1d(coefficients_open)
coefficients_close = np.polyfit(df['Date'], df['Close'], 1)
p_close = np.poly1d(coefficients_close)
fig, ax = plt.subplots()
ax.plot(df['Date'], df['Open'], '.', label='Open Price')
ax.plot(df['Date'], p_open(df['Date']), '-', label='Open Regression Line')
ax.plot(df['Date'], df['Close'], '.', label='Close Price')
ax.plot(df['Date'], p_close(df['Date']), '-', label='Close Regression Line')
ax.set_title('DIJA Stock Prices')
ax.set_xlabel('Date')
ax.set_ylabel('Price')
ax.legend()
# Format x-axis labels as dates
date_form = mdates.DateFormatter("%Y-%m-%d")
ax.xaxis.set_major_formatter(date_form)
plt.gcf().autofmt_xdate()
plt.show()
In this code, we first convert dates to numeric values using date2num()
function and then use the polyfit()
function to obtain the coefficients for the regression line. We use to plot the line using the poly1d()
function. We plot the original data points using dots and the regression line using a solid line.
Adding Error Bars
Error bars are a graphical representation of the variability of data and can be used to indicate the uncertainty in the measurements.
This is particularly useful when you’re expecting some errors in the data collection process, like temperature data, air quality data, and so on. Though certain about the stock prices, let’s assume a potential error of one standard deviation and plot it using the errorbar function in matplotlib.
# Calculate standard deviation of data
std = df['Close'].std()
# Add error bars to plot
plt.errorbar(df['Date'], df['Close'], yerr=std/2, fmt='.', label='Close Price')
plt.title('DIJA Stock Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
In this code, we first calculate the standard deviation of the Close
prices in our dataset. We then use the errorbar()
function to add error bars to the line plot, with the error bar size set to half of the standard deviation.
These techniques allow us to add statistical information to our line plots and gain deeper insights into our data.
Conclusion
Line plots are a powerful tool for visualizing trends and patterns in data, and Matplotlib provides a user-friendly interface to create them.
As a next step, you might want to follow our Intermediate Python course, where you apply everything you’ve learned to a hacker statistics case study.
We hope this tutorial has helped get you started with creating line plots in Matplotlib. We’ve also covered extensively the other matplotlib plots in another tutorial, which can briefly introduce you to what else you can do with Matplotlib.
Keep exploring and experimenting with creating stunning visualizations and uncovering insights from your data!
Get certified in your dream Data Analyst role
Our certification programs help you stand out and prove your skills are job-ready to potential employers.
cheat-sheet
Matplotlib Cheat Sheet: Plotting in Python
tutorial
Introduction to Plotting with Matplotlib in Python
Kevin Babitz
25 min
tutorial
Matplotlib time series line plot
tutorial
Python Seaborn Line Plot Tutorial: Create Data Visualizations
tutorial
Histograms in Matplotlib
code-along
Data Visualization in Python for Absolute Beginners
Justin Saddlemyer