Skip to content

Investigate Coffee Shop Sales

This sample dataset contains sales and transaction data from January through June. I downloaded the dataset from Kaggle, and utilized SQL to answer the following questions about the dataset:

  1. Do sales change by month?
  2. What product categories are people buying? Which are the most profitable?
  3. What product categories are being bought at what times throughout the day?
  4. Do sales change by the day of the week?
  5. How is each of the three company store locations performing?
  6. What are the 10 best-selling products?
  7. Rank the top selling products in each of the product categories.

Let's see what we can find out.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sales = pd.read_excel('coffee_shop_original.xlsx')
Spinner
DataFrameas
sales
variable
SELECT *
FROM sales

1. Do sales change by month?

The query below groups the number of transactions per month, as well as the total dollar sales per month.

Spinner
DataFrameas
monthly
variable
SELECT
	EXTRACT(month FROM transaction_date) AS month,
	COUNT(transaction_id) AS transactions,
	ROUND(SUM(transaction_qty * unit_price),0) AS sales
FROM sales
GROUP BY month
ORDER BY month

2. What product categories are people buying, and which are the most profitable?

The query below groups the number of transactions per product category, as well as total dollars sales by product category, then calculates dollars revenue per transaction.

Spinner
DataFrameas
category
variable
SELECT
	product_category,
	COUNT(transaction_id) AS transactions,
	ROUND(SUM(transaction_qty * unit_price),0) AS sales,
	ROUND(SUM(transaction_qty * unit_price) / COUNT(transaction_id), 2) AS dollars_per_transaction
FROM sales
GROUP BY product_category
ORDER BY sales DESC

3. What product categories are people buying and at what times are they buying them?

The query below is similar to the above query, however it takes that one step further. This query identifies total dollar sales per product category, per hour of the business day.

Spinner
DataFrameas
hourly
variable
SELECT
	EXTRACT(hour FROM CAST(transaction_time AS TIME)) AS hour,
	product_category,
	COUNT(transaction_id) AS transactions
FROM sales
GROUP BY hour, product_category
ORDER BY transactions DESC

4. What days are people buying?

The query below groups transaction counts by the day of week, and also sum of dollar sales by day of week.

Spinner
DataFrameas
dow
variable
SELECT
	EXTRACT(DOW FROM transaction_date) AS day_of_week,
	COUNT(transaction_id) AS count,
	ROUND(SUM(transaction_qty * unit_price),0) AS sales
FROM sales
GROUP BY EXTRACT(DOW FROM transaction_date)
ORDER BY day_of_week

5. How is each location performing?

The query below groups transaction count by store location, as well as total dollar sales by store location.

Spinner
DataFrameas
location
variable
SELECT
	store_location,
	COUNT(transaction_id) AS transactions,
	ROUND(SUM(transaction_qty * unit_price),0) AS sales
FROM sales
GROUP BY store_location
ORDER BY sales DESC

6. What are the top 10 best-selling products?

The query below provides a count of the top 10 selling products, in descending order. There is also a 'rank' column, produced by a window function.

Spinner
DataFrameas
top_types
variable
SELECT
	product_type,
	COUNT(product_type) AS count,
	RANK() OVER (ORDER BY count DESC) AS rank
FROM sales
GROUP BY product_type
LIMIT 10