You're working for a company that sells motorcycle parts, and they've asked for some help in analyzing their sales data!
They operate three warehouses in the area, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.
The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. You have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale"
orders are included.
They have provided you with access to their database, which contains the following table called sales
:
Sales
Column | Data type | Description |
---|---|---|
order_number | VARCHAR | Unique order number. |
date | DATE | Date of the order, from June to August 2021. |
warehouse | VARCHAR | The warehouse that the order was made from— North , Central , or West . |
client_type | VARCHAR | Whether the order was Retail or Wholesale . |
product_line | VARCHAR | Type of product ordered. |
quantity | INT | Number of products ordered. |
unit_price | FLOAT | Price per product (dollars). |
total | FLOAT | Total price of the order (dollars). |
payment | VARCHAR | Payment method—Credit card , Transfer , or Cash . |
payment_fee | FLOAT | Percentage of total charged as a result of the payment method. |
Your query output should be presented in the following format:
product_line | month | warehouse | net_revenue |
---|---|---|---|
product_one | --- | --- | --- |
product_one | --- | --- | --- |
product_one | --- | --- | --- |
product_one | --- | --- | --- |
product_one | --- | --- | --- |
product_one | --- | --- | --- |
product_two | --- | --- | --- |
... | ... | ... | ... |
# Import python libraries
import matplotlib.pyplot as plt
import seaborn as sns
# Set options
sns.set_style('whitegrid')
sns.set(palette= sns.color_palette('tab10'))
plt.rcParams['axes.titleweight']='bold'
Previewing the dataset
-- Previewing the dataset
SELECT *
FROM sales
LIMIT 5;
SELECT DISTINCT product_line,
sum(quantity) as total_products
FROM sales
GROUP BY product_line;
sns.catplot(x='product_line',y='total_products', data = df, kind= 'bar')
plt.xticks(rotation = 45)
plt.title('Number of products')
plt.ylabel('Total products sold')
plt.xlabel('Product line')
plt.show()
-- Distribution of warehouses
SELECT warehouse,
count(warehouse) AS num_warehouses
FROM sales
GROUP BY warehouse
plt.pie(df1.num_warehouses, labels = df1.warehouse, startangle=90, explode = [0.02,0.02,0.02], autopct= '%1.1f%%')
plt.title('Warehouse segmentation')
plt.show()
Product line selling most quantities
SELECT product_line,
client_type,
sum(quantity) AS total_products
FROM sales
GROUP BY product_line,
client_type
ORDER BY total_products DESC;
sns.barplot(y='product_line', x='total_products', data=df2, hue='client_type')
plt.show()
Wholesale revenues by product line per month
SELECT product_line,
initcap(to_char(date,'month')) AS month,
warehouse,
sum(total - payment_fee) AS net_revenue
FROM sales
WHERE client_type = 'Wholesale'
GROUP BY product_line,
MONTH,
warehouse
ORDER BY product_line,
MONTH,
net_revenue DESC ;