Skip to main content
HomeTutorialsData Visualization

Data Visualization with Bokeh Tutorial: Plotting Data Structures

Learn basic visualizations of various data structures using Bokeh.
Mar 2022  · 6 min read

In our first Data Visualization with Bokeh tutorial, you learned how to build visualizations from scratch using glyphs. Now, you can extend this concept and build plots from different data structures such as arrays and dataframes. In this tutorial, you will learn how to plot data with NumPy arrays, dataframes in Pandas, and ColumnDataSource using Bokeh. The ColumnDataSource is the essence of Bokeh, making it possible to share data over multiple plots and widgets.

Bokeh Plots from NumPy Arrays

Creating line plots from NumPy arrays

# Import packages
import numpy as np
import random
from bokeh.io import output_file, show
from bokeh.plotting import figure

# Create an array
x_array = np.array([10,20,30,40,50,60])
y_array = np.array([50,60,70,80,90,100])

# Create a line plot
plot = figure()
plot.line(x_array, y_array)

# Output plot
show(plot)

Creating scatter plots from NumPy arrays

# Import packages
import numpy as np
import random
from bokeh.io import output_file, show
from bokeh.plotting import figure

# Create arrays
x_arr = np.array([10,11,12,13,14,15,16])
y_arr = np.array([17,18,19,20,21,22,23])

# Create plot
plot = figure(title = "Scatter plot", x_axis_label = "Label name of x axis", y_axis_label ="Label name of y axis")
plot.circle(x_arr,y_arr, size = 30, alpha = 0.5)

# show plot
show(plot)

This yields the following graph:

Creating Plots Using Pandas Dataframes

Plotting lines with Pandas dataframes

The data in this example is obtained from Kaggle. The dataset contains all the stocks contained in the S&P 500 index for five years up to December 2018. The file contains seven columns:

  • Date - day of the week date in format yy-mm-dd
  • Open - stock’s opening price
  • High - stock’s highest price in the day
  • Low - stock’s lowest price of the day
  • Close - stock’s closing price for the day
  • Volume - volume of stocks traded in the day
  • Name - stock ticker name

We want to see the trend for Google’s stock over the five-year period. First, we load the data using Pandas and then check the first five rows of the Pandas dataframe.

# Import packages
import pandas as pd
# Load data
df = pd.read_csv('all_stocks_5yr.csv')
df.head()
# Focus on Google's stock
df_goog = df[df['Name'] == 'GOOG']
# Convert date column to a time series
df_goog['date'] = pd.to_datetime(df_goog['date'])

plot = figure(title= "Pandas Line Plot", x_axis_type = 'datetime', x_axis_label = 'date', y_axis_label= 'Prices')
plot.line(x = df_goog['date'], y = df_goog['high'])

# Output plot
show(plot)

In the chart, it was necessary to convert the date column to datetime to render dates along the x-axis. Additionally, the line function (glyph in Bokeh) is used to generate a time series plot for Google's high prices.

Creating a scatter plot from Pandas dataframe

In this section, we use the open-source S&P 500 stock data available on Kaggle. In this example, we want to see the correlation between Google's stock prices and Apple's stock prices. You can learn more about correlation here.

# Import required packages
# Import packages
import pandas as pd
from bokeh.io import output_file, show
from bokeh.plotting import figure

# Load data
df = pd.read_csv('all_stocks_5yr.csv')

# Select Google and Apple stock
df_goog = df[df['Name'] == 'GOOG']
df_apple = df[df['Name'] == 'AAPL']

# Select only 'date' and 'high' columns for Google and Apple
df_goog = df_goog[['date','high']]
df_apple = df_apple[['date','high']]
# Rename 'high' column to 'high_goog' and 'high_apple'
df_goog.rename(columns={'high': 'high_goog'}, inplace=True)
df_apple.rename(columns={'high': 'high_apple'}, inplace=True)

df_merged = pd.merge(df_goog, df_apple, how='inner', on='date')
# See the first five rows
df_merged.head(5)

Merging ensures that the daily prices correspond to each other based on date. This mitigates the problem of correlating prices that are from different dates, as this will yield false statistical inference.

# Create scatter plot
plot = figure(title= "Pandas Scatter Plot", x_axis_label = "Apple high prices", y_axis_label= "Google high prices")
plot.circle(x = df_merged['high_apple'], y = df_merged['high_goog'],color = 'red', size = 10, alpha = 0.8)

# Output the plot
show(plot)

The circles represent the link between Apple and Google's high prices. There is a general positive correlation between Google and Apple’s stock prices.

Creating Bokeh Plots from Columndatasource

The purpose of ColumnDataSource is to allow you to create different plots using the same data. In short, it allows you to build a foundation of data for calling in multiple plots and analyses. This will save you time, as you won't have to load data multiple times in Jupyter Notebook.

The ColumnDataSource is akin to a python dictionary, where the value of the data in a column has a key string name for that very column.

Creating line plots using ColumnDataSource

# Import packages
from bokeh.io import output_file, show
from bokeh.plotting import figure, ColumnDataSource

# Create the ColumnDataSource object
df_goog['date'] = pd.to_datetime(df_goog['date'])
data = ColumnDataSource(df_goog)

# Create time series plot
plot = figure(x_axis_type='datetime',
              title='GOOG high prices ts plot',
             x_axis_label = "Date",
              y_axis_label = "Prices in $")
plot.line(x='date', y = 'high_goog', source = data, color = 'green')
show(plot)

The plot shows us how Google’s high prices evolved over a 5-year period with the line indicating each price point of the stock.

To create this plot, we passed the entire Google stock dataframe into the ColumnDataSource function and saved it as a variable named data.

Then, we used the source argument to pass this data into the plot function. This makes it possible to use the column names directly when creating the plot instead of repeatedly calling the dataframe.

Finally, we used the line function to generate the green line to represent the data points.

Creating scatter plots using ColumnDataSource

In this code, we passed the entire Google stock dataframe into the ColumnDataSource (as in the line plot example). The dataframe is saved into the ColumnDataSource as a variable named data. We first create the line plot by directly calling the column names. Finally, we used the diamond function to generate the red diamonds to represent the individual data points.

# Import packages
from bokeh.io import output_file, show
from bokeh.plotting import figure, ColumnDataSource

# Create the ColumnDataSource object
df_goog['date'] = pd.to_datetime(df_goog['date'])
data = ColumnDataSource(df_goog)

# Create time series plot
plot = figure(x_axis_type='datetime',
              title='GOOG high prices ts plot',
             x_axis_label = "Date",
              y_axis_label = "Prices in $")
plot.line(x='date', y = 'high_goog', source = data, color = 'green')
plot.diamond(x='date', y = 'high_goog', source = data, color = 'red')
show(plot)

The plot shows the evolution of Google's stock over a 5-year period with the red diamonds indicating each price point of the stock.

Conclusion

This tutorial introduced the most common data structures that you are likely to encounter in Python and how you can use them to bring your data to life. While Bokeh is not your only option for Python visualization, it distinguishes itself from Matplotlib or Seaborn by its ease of use and interactive capabilities. In our next tutorial, you will gain hands-on experience in using layouts for creating effective and interactive visualizations. If you’d like to do a deep dive into all the data visualization possibilities offered by Bokeh, check out DataCamp’s Interactive Data Visualization with Bokeh course.

Topics
Related

AWS Certifications in 2024: Levels, Costs & How to Pass

Explore our full guide on AWS Certifications, including which one is best for you and how to pass the exams. Plus discover DataCamp resources to help!
Adel Nehme's photo

Adel Nehme

20 min

10 Top Data Analytics Conferences for 2024

Discover the most popular analytics conferences and events scheduled for 2024.
Javier Canales Luna's photo

Javier Canales Luna

7 min

Top 20 Snowflake Interview Questions For All Levels

Are you currently hunting for a job that uses Snowflake? Prepare yourself with these top 20 Snowflake interview questions to land yourself the job!
Nisha Arya Ahmed's photo

Nisha Arya Ahmed

15 min

[AI and the Modern Data Stack] Adding AI to the Data Warehouse with Sridhar Ramaswamy, SVP of AI at Snowflake

Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, the challenges of enterprise search, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, advice for organizations looking to improve their data management, and much more.
Richie Cotton's photo

Richie Cotton

45 min

Mastering Slowly Changing Dimensions (SCD)

Level-up your data modeling skills by diving head-first into slowly changing dimensions. Sharpen your skills with hands-on examples using Snowflake, and identify common challenges and solutions when implementing SCD.
Jake Roach's photo

Jake Roach

12 min

Mastering Bayesian Optimization in Data Science

Unlock the power of Bayesian Optimization for hyperparameter tuning in Machine Learning. Master theoretical foundations and practical applications with Python to enhance model accuracy.
Zoumana Keita 's photo

Zoumana Keita

11 min

See MoreSee More