Skip to content

Bicycle Sales

Source: DataCamp DataLab Bicycle Sales Microsoft SQL Server sample database

Intro: This dataset looks at bike orders at a fictional company. It gives information about customers, sales representatives, order details, product details along with other information including stocks of bikes etc. In this project, I aim to extract, transform and load the dataset to produce a dashboard to give an overview of the performance of the company, showing important takeaway highlights suchs as revenue.

I have outlined the steps I took during this project below. You can find the code I used for steps 1 - 3 in the attached document "Code for Steps" (Invalid URL)

Business questions:

  1. What were the top 5 selling products and categories?
  2. What were the trends in revenue over time? What was revenue distributed by different factors such as bike category?
  3. Where are most customers located?
  4. How many orders were shipped on-time and how many were late?

STEP 1.

I did an initial check on the data by checking if the data types of each field were appropraite (e.g. a date field should have a DATE datatype, an id column should be INT etc...) using the code shown in the next cell. Then I used the column names and their corresponding data types to build 2 star schemas, namely Production Star Schema and Sales Star Schema. Please find these two Star schemas attached to this workbook in the file named "Star Schemas"

Spinner
DataFrameas
df2
variable
--sample code for getting columns and datatypes
Select column_name, data_type 
from information_schema.columns 
where table_name = 'brands' -- I changed table names

STEP 2.

Next, I checked that each table had a primary key that had a unique ID for each row. The Products, Brands, Categories, Customers, Orders, Stores and Staffs table each had their own primary key. However, the Stocks table and Order_items table didn't have a primary key that uniquely defined each row. The code below demonstrates how I determined whether there was a primary key in the tables.

Spinner
DataFrameas
df3
variable
--Example of the code I used to check if ID is unique for each row in a table:
select * from production.products --checking number of rows
select count(distinct(product_id)) from production.products --checking the distinct number of id's
--If there is the name number of unique id's as there are number of rows in the original table, this means that each row has a uniquely assigned ID

--Code I used for the Stocks table which had two different ID columns to check if any of them were primary keys
select * from production.stocks	--returns 939 rows
select count(distinct(store_id)) from production.stocks	--returns 3 so not a primary key
select count(distinct(product_id)) from production.stocks --returns 313 so not primary key

--Code I used for the Staffs table which had three different ID columns to check if any of them were primary keys:
select * from sales.order_items --returns 4722 rows
select count(distinct(order_id)) as 'disctinct count of order_id' from sales.order_items --returns 1615 so not primary key
select count(distinct(item_id)) as 'distinct count of item_id' from sales.order_items --returns NULL so not primary key
select count(distinct(product_id)) as 'distinct count of product_id' from sales.order_items --returns NULL so not primary key

STEP 3.

Next, I wrote an SQL query to retrieve a dataset with the information needed for analysis.

In SQL the dataset has 4,722 rows. When the dataset was exported to Excel it had 4,722 rows.

Spinner
DataFrameas
df
variable
-- extracting dataset

select o.order_id, o.order_date, o.required_date, o.shipped_date, c.customer_id, c.city, c.state, sum(i.quantity) as 'total_bikes_sold', sum(i.quantity * i.list_price)  as 'revenue', p.product_name, p.model_year, cat.category_name, b.brand_name, s.store_name
from sales.orders as o
inner join sales.customers as c
on o.customer_id = c.customer_id
inner join sales.order_items as i
on o.order_id = i.order_id
inner join production.products as p
on i.product_id = p.product_id
inner join production.categories as cat
on p.category_id = cat.category_id
inner join production.brands as b
on b.brand_id = p.brand_id
inner join sales.stores as s
on o.store_id = s.store_id

group by o.order_id, o.order_date, o.required_date, o.shipped_date, c.customer_id, c.city, c.state, p.product_name, p.model_year, cat.category_name, b.brand_name, s.store_name

STEP 4.

Then I used Excel Power Query Editor to profile the data e.g. to see if there were any NULL values. Since the data types for each column was "General", I gave each column the appropriate data types. I used all columns to check for duplicates by using Excel's "Remove Duplicates" feature.

Please find attached file 'Screenshots - Profiling data' for more detail.

You may also view the dataset in attached file 'Bike sales Final after Power Query Editor'

STEP 5.

Next, I loaded the dataset into Tableau to create a dashboard.

Follow this link to view dashboard: https://public.tableau.com/app/profile/kamilia.aitbrahim/viz/BikesalesFinalAnalysis/Dashboard1

Note about Tableau dashboard:

In the Revenue and Sales by Month graph, for the year 2018, there is a gap in May. This is explained as there is no data recorded for any orders for May.

Also, there is a sudden decrease in sales and revenue in June. This is because no data was recorded for June except for one order.

Please refer to attached file '2018 irregular trend in revenue and sales by month'

The fact that there were near to 0 sales in May and June suggests the company may have been shut for a period of time or had encountered a problem which lead to profound difficulty selling. The sudden decrease in sales after from July to December 2018 also suggests that the company continued to struggle selling products. The company may have been becoming bankrupt after which they may have stopped selling bikes altogether.

Key Insights:

  1. Top 5 selling products were:
    1. 296 Electra Cruiser 1 (24-inch) - 2016
    2. 290 Electra Townie Original 7D EQ - 2016
    3. 289 Electra Townie Original 21D - 2016
    4. 269 Electra Girl's Hawaii 1 (16-inch) - 2015/2016
    5. 167 Surly Ice Cream Truck Frameset - 2016

Top 5 selling categories:

  1. Cruisers Bicycles
  2. Mountain Bikes
  3. Children Bikes
  4. Comfort Bicycles
  5. Road Bikes
  1. In 2016, revenue stayed more or less even throughout the year at around 200,000 dollars with the peak revenue of 303,283 dollars occuring in September.

In 2017, revenue also stayed more or less even throughout the year at around 300,000 dollars, with the peak revenue of 419,892 dollars occuring in June.

In 2018, revenue had a significant increase from February to April, followed by no revenue in May.The revenue was drastically lower between June to December.

  1. Most customers were located in New York.

  2. 3,371 orders were shipped on time whilst 1,351 orders were shipped late.