Analyzing Online Ticket Sales with Amazon Redshift
In this workspace, we will be accessing data stored in Amazon Redshift, a data warehouse product that is part of Amazon Web Services. More specifically, we'll be analyzing sales activity from a fictional ticketing website where users both buy and sell tickets online for sporting events, shows, and concerts (source).
To consult the solution, head over to the file browser and select notebook-solution.ipynb
.
Explore events
-- List all the events
SELECT *
FROM event
This is linking up to several other tables in the warehouse, such as venue, category and date. Let's join things up.
SELECT *
FROM event
INNER JOIN venue USING(venueid)
INNER JOIN category USING(catid)
INNER JOIN date USING(dateid)
LIMIT 100
There's a starttime
column coming from the event
table and there's also a caldate
column, coming from the date
table. Let's see what's up with this.
SELECT
CASE WHEN DATE(starttime) = DATE(caldate) THEN TRUE ELSE FALSE END AS same_date,
COUNT(*)
FROM event
INNER JOIN date USING(dateid)
GROUP BY 1
SELECT MAX(DATEDIFF('HOUR', caldate, starttime))
FROM event
INNER JOIN date USING(dateid)
Let's see how much events are happening in different cities.
SELECT
venuecity,
COUNT(*) AS num_event
FROM event
INNER JOIN venue USING(venueid)
GROUP BY 1
ORDER BY 2 DESC
import plotly.express as px
px.bar(events_per_city, x='venuecity', y='num_event')
Explore listings and sales
-- show 100 listing records
SELECT *
FROM public.listing
LIMIT 100;
-- show 100 sales records
SELECT *
FROM sales
LIMIT 100;
Let's see if multiple sales can happen for the same listing.