Skip to content

SQL Sales Data Exploration

As a business analyst for a fictional online retail company, use the customers, products, and sales data to find the following information for company stakeholders using SQL query techniques. Assume today is the beginning of Q3 in 2024 or July 01, 2024.

  1. Find the sales transactions made in the last 30 days including customer name, product name, and total amount of the sale. Also find the total number of transactions, number of distinct customers, and total revenue during this period.
  2. Find the total revenue generated by each product category in 2023 from least to greatest.
  3. Find all customers who made purchases in 2023 and who are located in the West region so the marketing team can send them coupons for next quarter.
  4. Display the total number of transactions, total product quantity sold, and total revenue for each customer, in decreasing order by total revenue.
  5. Find the top 3 customers by total revenue in the year 2024 so they can receive Most Valuable Customer bonus products.
  6. Rank products by their total sales quantity in 2024 to identify the most popular items sold so far this year.
  7. Customers have the option of joining the Loyalty Program at any time. Categorize customers into "New", if their sign-up date for the Loyalty program is in the last 6 months or "Existing" otherwise. Where are most new loyalty program members located?
  8. Find the total transactions for each month in the last 12 months (Q3 2023 through Q2 2024).
  9. Identify the product categories that generated more than $4000 in revenue during the last 6 months (Q1 & Q2 of 2024).
  10. Check for any sales where the total amount in the sales data doesn't match the expected value of the sale using quantity x price.

Original datasets "customers", "products", and "sales" were created in Google Sheets for use in this project.

  • "customers" data includes: customer_id, customer_name, sales_region, sign_up_date
  • "products" data includes: product_id, product_name, category, price
  • "sales" data includes: sales_id, customer_id, product_id, sale_date, quantity, total_amount

Question 1

Find the sales transactions made in the last 30 days including customer name, product name, and total amount of the sale. Also find the total number of transactions, number of distinct customers, and total revenue during this period.

Answer There were 55 sales in the last 30 days to 32 distinct customers for a total revenue of $3320.

Spinner
DataFrameas
df
variable
SELECT c.customer_name, p.product_name, s.total_amount
FROM 'sales' as s
INNER JOIN 'customers' as c ON s.customer_id = c.customer_id
INNER JOIN 'products' as p ON s.product_id = p.product_id
WHERE DATEDIFF('day', CAST(s.sale_date AS DATE), CAST('2024-07-01' AS DATE) ) BETWEEN 0 and 31
Spinner
DataFrameas
df10
variable
SELECT COUNT(*) as total_transactions, 
COUNT(DISTINCT customer_name) as distinct_cust, 
SUM(total_amount) as total_revenue
FROM (SELECT c.customer_name, p.product_name, s.total_amount
	FROM 'sales' as s
	INNER JOIN 'customers' as c ON s.customer_id = c.customer_id
	INNER JOIN 'products' as p ON s.product_id = p.product_id
	WHERE DATEDIFF('day', CAST(s.sale_date AS DATE), CAST('2024-07-01' AS DATE) ) BETWEEN 0 and 31)

Question 2

Find the total revenue generated by each product category in 2023 from least to greatest.

Answer In the last year, Tech products generated $1346, Clothing products generated $2573, Toy products generated $2708, and Food products generated $3016. Tech products are significantly underperforming the other categories.

Spinner
DataFrameas
df1
variable
SELECT p.category, SUM(s.total_amount) as total_revenue
FROM 'sales' as s
INNER JOIN 'products' as p ON s.product_id = p.product_id
WHERE CAST(s.sale_date AS DATE) BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY p.category
ORDER BY SUM(s.total_amount) 

Question 3

Find all customers who made purchases in 2023 and who are located in the West region, so the marketing team can send them coupons for next quarter.

Answer In the West region, 11 distinct customers made purchases in 2023.

Spinner
DataFrameas
df2
variable
SELECT DISTINCT customer_id, customer_name
FROM 'customers'
INNER JOIN 
	(SELECT customer_id, sale_date 
	FROM 'sales' 
	WHERE YEAR(CAST(sale_date AS DATE))= 2023) USING (customer_id)
WHERE region = 'West'
ORDER BY customer_id

Question 4

Display the total number of transactions, total product quantity sold, and total revenue for each customer for 2023 & 2024, in decreasing order by total revenue.

Answer Evageline Ellis is the customer with the highest total revenue of $919 from buying 40 products in 14 separate transactions.

Spinner
DataFrameas
df
variable
SELECT customer_name, COUNT(id) as total_transactions, SUM(quantity) as total_prod, SUM(total_amount) as total_revenue 
FROM sales
LEFT JOIN customers USING (customer_id)
GROUP BY customer_name
ORDER BY SUM(total_amount) DESC

Note It is important to note that the last result is a customer entry that is missing the customer's name. Find the customer id and transaction information for this customer.

Answer Customer with id number 51 is missing their full name in the customers table. They made one transaction, a sale of 1 Product 26 for $22. Upon closer inspection, since there are only 50 customers in the customers dataset, this appears to be a typo and the Customer ID of 51 is incorrect. We would need to further investigate which customer made this purchase and correct the mistake in the dataset.

Spinner
DataFrameas
df3
variable
SELECT customer_name, * FROM sales 
FULL JOIN customers USING (customer_id)
WHERE customer_name IS NULL

Question 5

Find the top 3 customers by total revenue in the first half of 2024 so they can receive Most Valuable Customer bonus products.

Answer Since the sales data only contains transactions through the end of Quarter 2 in 2024, we can filter for just the year 2024 when identifying the top 3 customers.

Customers with id numbers 10, 28, and 25 generated the most total revenue in the year 2024.

Spinner
DataFrameas
df4
variable
SELECT customer_id, SUM(total_amount) as total_revenue
	FROM sales
WHERE YEAR(CAST(sale_date as DATE)) = 2024	
GROUP BY customer_id
	ORDER BY SUM(total_amount) DESC
LIMIT 3

Question 6

Rank products by their total sales quantity in 2024 to identify the most popular items sold so far this year.

Answer Product A is the most popular item in 2024, by the end of Q2. It sold 50 times.