Skip to content
Data Visualization, Amazon best sellings
  • AI Chat
  • Code
  • Report
  • Data Visualization in Python for Absolute Beginners

    Welcome to your webinar workspace! You can follow along as we go through some basic plot types using Python and Plotly!

    To consult the solution, head over to the file browser and select notebook-solution.ipynb.

    Basic charts

    Histograms

    For the first few examples, we will use some data on the top 50 bestselling novels on Amazon. Let's start with a histogram. Histograms are used to visualize the distribution of a continuous variable.

    In our case, we will visualize the distribution of prices of bestselling books from 2021.

    We will first define the data we want to plot as a list. Lists are a Python data type that store multiple elements and are enclosed in square brackets ([).

    # Specify the data
    prices_2021 = [7.48, 12.52, 17.78, 11.98, 7.49, 5.36, 6.99, 13.58, 14.34, 8.99, 6.62, 12.01, 18.0, 15.98, 10.62, 6.0, 10.58, 6.99, 4.31, 4.14, 10.26, 4.07, 14.4, 8.48, 8.49, 14.16, 26.0, 15.49, 4.79, 9.59, 11.6, 7.57, 13.99, 11.4, 10.35, 7.74, 13.79, 9.58, 13.09, 13.29, 19.42, 9.42, 10.34, 17.99, 14.8, 5.06, 8.55, 8.37, 14.89, 5.98]
    
    # Preview the list
    print(prices_2021)

    We then import plotly.express and initialize a figure object fig and use the histogram() function to specify the data we want on the x-axis.

    Lastly, we use the .show() method to generate the plot!

    # Import Plotly Express
    import plotly.express as px
    
    # Initialize a histogram
    fig = px.histogram(x=prices_2021)
    
    # Show the plot
    fig.show()
    # Import Plotly Express
    import plotly.express as px
    
    # Initialize a histogram
    fig = px.histogram(x=prices_2021,
                       nbins=5
                      )
    
    # Show the plot
    fig.show()

    We can also specify the granularity of the histogram by choosing the number of bins (nbins).

    Bar charts

    Next, we will cover a bar chart! Bar charts are a great way to plot counts or percentages of a categorical variable.

    For our bar chart, we will plot the average price by genre.

    # Specify the data
    genres = ["Fiction", "Non Fiction"]
    average_prices = [10.6, 14.5]
    
    # Preview the first list
    print(average_prices)

    We then use the bar() function to generate our plot in the same way as our histogram.

    # Initialize a bar plot
    fig = px.bar(x=genres,
                 y=average_prices
                )
    
    # Show the plot
    fig.show()

    Line charts

    Line charts are typically used to show how a variable (or variables) changes over time. With Plotly Express, it is incredibly easy to create a line chart.

    Let's plot the total number of reviews for each year.

    # Specify the data
    years = [2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]
    total_reviews = [235506, 273981, 405041, 654546, 654907, 792997, 711669, 709800, 644420, 696521, 794917, 1790733, 2818117]
    
    # Initialize a line plot
    fig = px.line(x=years,
                  y=total_reviews
                 )
    
    fig.update_layout()
    
    # Show the plot
    fig.show()

    Questions

    Scatter plots

    Scatter plots are similar to line plots and serve as a great way to visualize the relationship between two continuous variables that are not necessarily connected.

    Creating a scatter plot with Plotly with Plotly is just as easy as creating a line chart.

    We will use a pandas DataFrame instead of Python lists to make things easier. A DataFrame is a data structure composed of labelled columns and rows, much like a spreadsheet.