Skip to content

Superstore Sales Data Exploration & Cleaning

Due to the nature of sales and company privacy, I wasn't able to pull any real sales data from online, so I had to look to Kaggle for a dataset I could use to clean and build a dashboard.

I landed on the Superstore Dataset.

To start, I separated the single file into Sales Info, Customer Info, and Product Info .csv files in order to practice a little SQL before building the dashboard.

We'll start by simply joining all three tables to get a look at the full dataset:

Spinner
DataFrameas
df
variable
SELECT * FROM 'superstore_sales_info.csv'
LEFT JOIN 'superstore_customer_info.csv'
ON "Customer ID" = "customer_id"
LEFT JOIN join 'superstore_product_info.csv'
ON "Product ID" = "product_id";

Next, we'll find the top-5 products by quantity sold:

Spinner
DataFrameas
df1
variable
SELECT product_name, sum(Quantity) as SumQuant
FROM 'superstore_sales_info.csv'
LEFT JOIN 'superstore_product_info.csv'
ON "Product ID" = "product_id"
GROUP BY product_name
ORDER BY SumQuant DESC
LIMIT 5;

Let's also find the top-5 products by profit:

Spinner
DataFrameas
df2
variable
SELECT product_name, sum(Profit) as SumProfit
FROM 'superstore_sales_info.csv'
LEFT JOIN 'superstore_product_info.csv'
ON "Product ID" = "product_id"
GROUP BY product_name
ORDER BY SumProfit DESC
LIMIT 5;

Now let's combine all of the data sources into a single dataset that includes columns for the:

  • Year of the order_date
  • Month of the order_date
  • Season of the order_date
Spinner
DataFrameas
df3
variable
select *, date_part('year', "Order Date") as Year, date_part('month', "Order Date") as Month,
case when date_part('month', "Order Date") in (12,1,2) then 'Winter'
     when date_part('month', "Order Date") in (3,4,5) then 'Spring'
     when date_part('month', "Order Date") in (6,7,8) then 'Summer'
     when date_part('month', "Order Date") in (9,10,11) then 'Fall'
     else 'Unknown' end as Season
from "superstore_sales_info.csv"
left join "superstore_customer_info.csv"
on "Customer ID" = customer_id
left join "superstore_product_info.csv"
on "Product ID" = product_id;

I then exported that final table to an Excel file in order upload it to Tableau and work with the data in there.

  • You can see the resulting dashboard here.
  • Or check out more of my work here.

Thanks!