Skip to content
Comprehensive Bike Sales Analysis with Advanced SQL Queries
Project Title: Comprehensive Bike Sales Analysis with Advanced SQL Queries
Description:
This project involved an in-depth analysis of a bike sales database, addressing critical business requirements such as tracking product sales, evaluating salesperson performance, and comparing store performance across three states over time. Over 60 SQL queries were meticulously crafted to explore and extract insights, employing advanced methodologies like hierarchical aggregation, correlated and nested subqueries, summary window functions, and statistical analyses. The project also delved into distributions of data, revealing patterns and trends in product prices, delivery performance, and customer purchasing behavior.
DataFrameas
df
variable
Run cancelled
/* BICYCLE SALES DATABASE */
-- Let's explore the schemas in the Bicycle Sales databases
-- Starting with Production schema
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'production';
Queryas
query2
variable
Run cancelled
-- Sales schema
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'sales'DataFrameas
df
variable
Run cancelled
-- Determine the data types for columns in the Sales schema and Production schema
SELECT table_schema, table_name, column_name, data_type
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'sales' OR TABLE_SCHEMA = 'production'
-- The data types varies from int, varchar, tinyint, date and decimalQueryas
query4
variable
Run cancelled
-- Select product name, brand name, category, list price, model year and
-- The aim is to extract information about bike products.
SELECT product_name, category_name, list_price, model_year, SUM(quantity) AS stocks_count
FROM production.categories AS c
INNER JOIN production.products AS p
ON c.category_id = p.category_id
INNER JOIN production.stocks AS s
ON p.product_id = s.product_id
GROUP BY product_name, category_name, list_price, model_year
-- Electra Townie Commute GO!" and "Electra Moto 3" are currently in stock with 75 items each and are the highest-priced items at $2,999.99.DataFrameas
df1
variable
Run cancelled
-- The products are named and phrased in a specific format some products are named with brand located at the front of the text
-- Find out whether this is true
SELECT SUBSTRING(p.product_name, 1, LEN(b.brand_name)) AS trimmed_brand_name, b.brand_name, p.product_name
FROM production.categories AS c
INNER JOIN production.products AS p
ON c.category_id = p.category_id
INNER JOIN production.stocks AS s
ON p.product_id = s.product_id
LEFT JOIN production.brands AS b
on p.brand_id = b.brand_id
WHERE p.product_name NOT LIKE COALESCE(b.brand_name + '%', p.product_name)
-- Turns out all of the records returns no null value, indicating that the product name contains the substring of the brand name at the first sentencesDataFrameas
df2
variable
Run cancelled
-- Similarly, display trimmed brand name and product name for all products
SELECT SUBSTRING(p.product_name, 1, LEN(b.brand_name)) AS trimmed_brand_name, p.product_name
FROM production.categories AS c
INNER JOIN production.products AS p
ON c.category_id = p.category_id
INNER JOIN production.stocks AS s
ON p.product_id = s.product_id
LEFT JOIN production.brands AS b
on p.brand_id = b.brand_id
WHERE p.product_name LIKE COALESCE(b.brand_name + '%', p.product_name)
DataFrameas
df3
variable
Run cancelled
-- Besides the brand name, the product name also contained strings of which, presumambly, indicated the year it was manufactued
-- The year string was explicitly stated in the end of the string
SELECT REVERSE(SUBSTRING(REVERSE(p.product_name), 1, CHARINDEX(' ', REVERSE(p.product_name)))) AS trimmed_product_name, p.model_year, CHARINDEX(' ', REVERSE(p.product_name)) AS char_length_year
FROM production.categories AS c
INNER JOIN production.products AS p
ON c.category_id = p.category_id
INNER JOIN production.stocks AS s
ON p.product_id = s.product_id
LEFT JOIN production.brands AS b
on p.brand_id = b.brand_id
ORDER BY char_length_year DESC;
-- Results shown that some years containing more than a single year only corresponds with the year at the end of string, this suggest a pattern in the data that needs to be addressedQueryas
query8
variable
Run cancelled
-- How many stores are there listed in the stocks table
SELECT store_id, COUNT(store_id) AS stores_count
FROM production.stocks
GROUP BY store_id;
-- Currently, there are 3 stores id and total is 939 stores country-wideQueryas
query9
variable
Run cancelled
-- How many items are in stock in the stock table?
SELECT store_id, SUM(quantity) AS total_stocks
FROM production.stocks
GROUP BY store_id;
-- Store id 3 has the highest total number of items in stockQueryas
query10
variable
Run cancelled
-- By looking at the schema, what kind of relationship between the column manager_id and staff_id have in common? Are there any unique relationships between these two columns?
-- First, let's find out the number of staff members assigned for each manager.
SELECT manager_id, COUNT(staff_id) AS staff_count
FROM sales.staffs
GROUP BY manager_id
-- There is one staff that currently have no managers attached to FROMDataFrameas
df4
variable
Run cancelled
-- Explore the relationship between manager_id and staff_id in the staffs table further using the ROLLUP method
SELECT
COALESCE(CAST(manager_id AS VARCHAR), 'No Manager') AS manager_id,
COUNT(staff_id) AS staff_count
FROM
sales.staffs
GROUP BY
ROLLUP(manager_id)Queryas
query12
variable
Run cancelled
-- Find products without corresponding entries in the stocks table
SELECT p.product_id, p.product_name
FROM production.products AS p
LEFT JOIN production.stocks AS st
ON p.product_id = st.product_id
WHERE st.product_id IS NULL;
-- If we left join table products with table stocks, the return will be 947 items
-- However, about 8 products whose product id is not found in the stocks table
DataFrameas
df5
variable
Run cancelled
-- Find products and their character lengths
SELECT p.product_id, p.product_name, LEN(p.product_name) AS char_length
FROM production.products AS p
LEFT JOIN production.stocks AS st
ON p.product_id = st.product_id
ORDER BY char_length DESC;
Queryas
query14
variable
Run cancelled
-- Check the stores table from the sales schema
-- This time, we would like to see the store_id's name
SELECT s.store_id, s.store_name, st.quantity AS stocks, s.street, s.city, s.state
FROM production.stocks AS st
RIGHT JOIN sales.stores AS s
ON st.store_id = s.store_id;