Skip to content

Exploring Ticket Sales with Redshift


1 hidden cell

Exploratory Data Analysis

Tables and Data Structure

Categories

The categories table holds information about different event categories. The attributes captured here are as follows:

  • catid: unique identifier for each category
  • catgroup: the group or type the category belongs to (e.g., sports, concerts, theater)
  • catname: the name of the category (e.g., Baseball, Rock Concert)
  • catdesc: a description of the category, providing more details about what it encompasses

The categories table allows for the analysis of trends in ticket sales in regards to different types of events. Gaining a better understanding of how different event categories perform can help businesses tailor their marketing strategies and inventory management to meet consumer demand more effectively.

Spinner
DataFrameas
categories
variable
select * from public.category

The following is a brief overview of the categories dataset, using the .describe() method. The output provides us with some key statistics of each column in the dataset, including the mean, max, and min values to understand how data is distributed. This also helps us identify any irregularities or missing values that need to be addressed in the cleaning stage before analysis.

Here are some key points we can infer:

  • The catid column, being the only numerical data in the categories table, is analyzed by the .describe() method. It shows the count, mean, standard deviation, minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum values of the category IDs.
  • Since catgroup, catname, and catdesc are non-numeric, they are not included in the .describe() output. To investigate these columns, we would need to use other methods suitable for textual data analysis.
  • The count of catid indicates the total number of unique categories present in the dataset.
  • The minimum and maximum values of catid give us an idea of the range of category IDs.
  • The mean and standard deviation provide insights into the central tendency and dispersion of the category IDs, respectively.

It is important to note that the .describe() function primarily analyzes numerical data. Therefore, to thoroughly investigate the dataset, especially the non-numeric types like catgroup, catname, and catdesc, we will need to employ additional methods tailored for textual or categorical data analysis.

categories.describe()

Dates

The dates table holds information about possible dates when tickets were sold. The attributes captured here are as follows:

  • dateid: unique identifier
  • caldate: the respective datetime object
  • day: the day of the week that particular date falls on, abrreviated by the first two letters of the day
  • week: indicates which week out of the year (1-53) the date falls on
  • qtr: indicates which quarter of the year the date falls on
  • year: indicates which year the date falls on
  • holiday: boolean value indicating whether or not the respective day was a holiday

The dates table allows for the analysis of trends in ticket sales in regards to time-related factors such as seasonality, month, and day of the week. Gaining a better understanding of when ticket sales increase or decrease can drive decision-making for marketing or sales teams.

Spinner
DataFrameas
dates
variable
select * from public.date order by dateid
Hidden output

The following is a brief overview of the dates dataset, using the .describe() method. The output gives us some key statistics of each column in the dataset, including the mean, max, and min values to see how data is distributed. This also allows us to catch some irregularities or missing values we need to address in the cleaning stage before analysis.

Here are some key points we can see:

  • There are 365 rows or records in the table.
  • All dates were from the year 2008
  • Date ids begin at 1827 and go through 2191

It is important to note that the describe function only analyzes numerical data, so in order to investigate data in non-numeric types, we will need to use another method.

dates.describe()
Hidden output

On it's own, the dates dataset is not particularly useful, but we can join it to other tables where we want to know a bit more about the date attached to a record. The dateid is used as a foreign key in several of the tables we discuss here.

Events

The events table holds information about past events where tickets were sold. The attributes captured here are as follows:

  • eventid: unique identifier for each event.
  • venueid: identifier for the venue where the event took place.
  • catid: category identifier for the type of event.
  • dateid: identifier linking to the date the event occurred.
  • eventname: the name of the event.
  • starttime: the start time of the event.

The events table allows for the analysis of trends in ticket sales in regards to the timing, location, and type of events. Gaining a better understanding of when ticket sales change can help in predicting future sales, optimizing event scheduling, and tailoring marketing strategies to target audiences more effectively. By analyzing this data, organizations can identify peak sales periods, popular venues, and preferred event categories, enabling them to make informed decisions that enhance customer satisfaction and maximize revenue.

Spinner
DataFrameas
events
variable
select * from public.event order by eventid
Hidden output