Analyzing Ticket Sales Data with Amazon Redshift

Analyzing Online Ticket Sales with Amazon Redshift

In this workspace, we will be accessing data stored in Amazon Redshift, a data warehouse product that is part of Amazon Web Services. More specifically, we'll be analyzing sales activity from a fictional ticketing website where users both buy and sell tickets online for sporting events, shows, and concerts (source).

To consult the solution, head over to the file browser and select notebook-solution.ipynb.

Explore events

DataFrameas

all_events

variable

-- List all the events
SELECT *
FROM event;

This is linking up to several other tables in the warehouse, such as venue, category and date. Let's join things up.

DataFrameas

venue_category_date

variable

SELECT *
FROM event
INNER JOIN venue USING(venueid)
INNER JOIN category USING(catid)
INNER JOIN date USING(dateid)
LIMIT 100;

There's a starttime column coming from the event table and there's also a caldate column, coming from the date table. Let's see what's up with this.

DataFrameas

cal_date

variable

SELECT
	CASE WHEN DATE(caldate) = DATE(starttime) THEN True
	ELSE False
	END AS same_date,
	COUNT(*)
FROM event
INNER JOIN date USING(dateid)
GROUP BY 1;

We've got 703 dates from 'caldate' and 'starttime' that do not match. Let's try and see the maximum hour difference between these columns.

DataFrameas

max_date_diff

variable

SELECT MAX(DATEDIFF('hour', caldate, starttime))
FROM event
INNER JOIN date USING(dateid);

We've got a 20-hour maximum time difference between the 'caldate' and 'startime' columns. Since there isn't really a 24-hour difference between these columns, we'll leave it like that for now.

Let's see how much events are happening in different cities.

DataFrameas

events_per_city

variable

SELECT
	DISTINCT venuecity,
	COUNT(*) AS cnt
FROM event
INNER JOIN venue USING (venueid)
GROUP BY venuecity
ORDER BY cnt DESC;

import plotly.express as px

fig = px.bar(data_frame=events_per_city, x='venuecity', y='cnt', 
             color='venuecity', title='Event Counts per City',
            labels={
                'venuecity' : ' Venue City',
                'cnt' : 'Count'
            })

fig.show()

New York takes the top spot for the venue with the most number of events in this dataset.

Explore listings and sales

‌
‌
‌

Analyzing Ticket Sales Data with Amazon Redshift

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Analyzing Online Ticket Sales with Amazon Redshift

Explore events

Explore listings and sales

Analyzing Online Ticket Sales with Amazon Redshift