Skip to content
Analyzing online ticket sales with Amazon Redshift
  • AI Chat
  • Code
  • Report
  • Analyzing Online Ticket Sales with Amazon Redshift

    In this workspace, we will be accessing data stored in Amazon Redshift, a data warehouse product that is part of Amazon Web Services. More specifically, we'll be analyzing sales activity from a fictional ticketing website where users both buy and sell tickets online for sporting events, shows, and concerts (source).

    To consult the solution, head over to the file browser and select notebook-solution.ipynb.

    Explore events

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- List all the events
    SELECT * FROM public.event
    

    This is linking up to several other tables in the warehouse, such as venue, category and date. Let's join things up.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM event
    INNER JOIN public.venue USING(venueid)
    INNER JOIN public.category USING(catid)
    INNER JOIN public.date USING(dateid)
    LIMIT 100

    There's a starttime column coming from the event table and there's also a caldate column, coming from the date table. Let's see what's up with this.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT
    	CASE WHEN DATE(caldate) = DATE(starttime) THEN True ELSE False END AS same_date,
    	COUNT(*)
    FROM event
    INNER JOIN date USING(dateid)
    GROUP BY 1
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT MAX(DATEDIFF('hour', caldate, starttime))
    FROM event
    INNER JOIN date USING(dateid)

    Let's see how much events are happening in different cities.

    Unknown integration
    DataFrameavailable as
    events_per_city
    variable
    SELECT 
    	venuecity,
        COUNT(*) AS num_event
    FROM event
    INNER JOIN venue USING(venueid)
    GROUP BY venuecity
    ORDER BY num_event DESC
    import plotly.express as px
    px.bar(events_per_city, x = 'venuecity', y = 'num_event')

    Explore listings and sales

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- show 100 listing records
    SELECT * FROM public.listing
    LIMIT 100
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- show 100 sales records
    SELECT * FROM public.sales
    LIMIT 100

    Let's see if multiple sales can happen for the same listing.