Data Visualization in Python for Absolute Beginners
Welcome to your webinar workspace! You can follow along as we go through some basic plot types using Python and Plotly!
To consult the solution, head over to the file browser and select notebook-solution.ipynb
.
Basic charts
Histograms
For the first few examples, we will use some data on the top 50 bestselling novels on Amazon. Let's start with a histogram. Histograms are used to visualize the distribution of a continuous variable.
In our case, we will visualize the distribution of prices of bestselling books from 2021.
We will first define the data we want to plot as a list. Lists are a Python data type that store multiple elements and are enclosed in square brackets ([
).
# Specify the data
prices_2021 = [7.48, 12.52, 17.78, 11.98, 7.49, 5.36, 6.99, 13.58, 14.34, 8.99, 6.62, 12.01, 18.0, 15.98, 10.62, 6.0, 10.58, 6.99, 4.31, 4.14, 10.26, 4.07, 14.4, 8.48, 8.49, 14.16, 26.0, 15.49, 4.79, 9.59, 11.6, 7.57, 13.99, 11.4, 10.35, 7.74, 13.79, 9.58, 13.09, 13.29, 19.42, 9.42, 10.34, 17.99, 14.8, 5.06, 8.55, 8.37, 14.89, 5.98]
# Preview the list
print(prices_2021)
We then import plotly.express
and initialize a figure object fig
and use the histogram()
function to specify the data we want on the x-axis.
Lastly, we use the .show()
method to generate the plot!
# Import Plotly Express
import plotly.express as px
# Initialize a histogram
fig = px.histogram(x=prices_2021)
# Show the plot
fig.show()
# Import Plotly Express
import plotly.express as px
# Initialize a histogram
fig = px.histogram(x=prices_2021,
nbins=5
)
# Show the plot
fig.show()
We can also specify the granularity of the histogram by choosing the number of bins (nbins
).
Bar charts
Next, we will cover a bar chart! Bar charts are a great way to plot counts or percentages of a categorical variable.
For our bar chart, we will plot the average price by genre.
# Specify the data
genres = ["Fiction", "Non Fiction"]
average_prices = [10.6, 14.5]
# Preview the first list
print(average_prices)
We then use the bar()
function to generate our plot in the same way as our histogram.
# Initialize a bar plot
fig = px.bar(x=genres,
y=average_prices
)
# Show the plot
fig.show()
Line charts
Line charts are typically used to show how a variable (or variables) changes over time. With Plotly Express, it is incredibly easy to create a line chart.
Let's plot the total number of reviews for each year.
# Specify the data
years = [2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]
total_reviews = [235506, 273981, 405041, 654546, 654907, 792997, 711669, 709800, 644420, 696521, 794917, 1790733, 2818117]
# Initialize a line plot
fig = px.line(x=years,
y=total_reviews
)
fig.update_layout()
# Show the plot
fig.show()
Questions
Scatter plots
Scatter plots are similar to line plots and serve as a great way to visualize the relationship between two continuous variables that are not necessarily connected.
Creating a scatter plot with Plotly with Plotly is just as easy as creating a line chart.
We will use a pandas DataFrame instead of Python lists to make things easier. A DataFrame is a data structure composed of labelled columns and rows, much like a spreadsheet.