Skip to content

Welcome! You are now in DataLab.

You successfully completed your project and are looking for some additional related challenges. This DataLab workbook contains the official solution from our curriculum staff, along with Additional Challenges at the bottom. If you would like a quick overview of DataLab, please refer to the help menu. You can easily share your project with your friends and colleagues when you're done.

Good luck with your additional challenges!

You're working for a company that sells motorcycle parts, and they've asked for some help in analyzing their sales data!

They operate three warehouses in the area, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.

The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. You have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale" orders are included.

They have provided you with access to their database, which contains the following table called sales:

Sales

ColumnData typeDescription
order_numberVARCHARUnique order number.
dateDATEDate of the order, from June to August 2021.
warehouseVARCHARThe warehouse that the order was made from— North, Central, or West.
client_typeVARCHARWhether the order was Retail or Wholesale.
product_lineVARCHARType of product ordered.
quantityINTNumber of products ordered.
unit_priceFLOATPrice per product (dollars).
totalFLOATTotal price of the order (dollars).
paymentVARCHARPayment method—Credit card, Transfer, or Cash.
payment_feeFLOATPercentage of total charged as a result of the payment method.

Your query output should be presented in the following format:

product_linemonthwarehousenet_revenue
product_one---------
product_one---------
product_one---------
product_one---------
product_one---------
product_one---------
product_two---------
............
Spinner
DataFrameas
revenue_by_product_line
variable
-- Start coding here
WITH truncated_to_month AS (
	SELECT
		product_line,
		EXTRACT(MONTH FROM date) AS month,
		warehouse,
		total - payment_fee AS revenue
	FROM sales
	WHERE client_type = 'Wholesale'
)

SELECT
	product_line,
	CASE 
		WHEN month = 6 THEN 'June'
		WHEN month = 7 THEN 'July'
		WHEN month = 8 THEN 'August'
	END AS month,
	warehouse,
	SUM(revenue) AS net_revenue
FROM truncated_to_month
GROUP BY product_line, month, warehouse
ORDER BY product_line, month, net_revenue DESC;

Extended Project below

The finance team is exploring ways to reduce transaction costs and improve profitability. They’ve asked you to determine the most profitable payment method for each warehouse in each month. Calculate the net revenue for each payment method, grouped by warehouse and month, and identify the top payment method for each combination.

Spinner
DataFrameas
df
variable
WITH truncated_to_month AS (
	SELECT
		EXTRACT(MONTH FROM date) AS month,
		warehouse,
		payment,
		total - payment_fee AS revenue
	FROM sales
),
aggregated_by_month_per_payment_and_warehouse AS (
	SELECT
		CASE 
			WHEN month = 6 THEN 'June'
			WHEN month = 7 THEN 'July'
			WHEN month = 8 THEN 'August'
		END AS month,
		warehouse,
		payment,
		SUM(revenue) AS net_revenue
	FROM truncated_to_month
	GROUP BY payment, month, warehouse
),
ranked_aggregated_by_month_per_payment_and_warehouse AS (
	SELECT
		*,
		RANK() OVER(PARTITION BY warehouse, month ORDER BY net_revenue DESC) AS rank
	FROM aggregated_by_month_per_payment_and_warehouse
)

SELECT
	warehouse,
	month,
	payment
FROM ranked_aggregated_by_month_per_payment_and_warehouse
WHERE rank = 1
ORDER BY warehouse, month;

The marketing team is planning a targeted campaign and wants to know the most popular product lines for retail and wholesale customers.

They have given you the task to find the top 3 most ordered product lines for each client type.

Spinner
DataFrameas
df1
variable
-- Start coding here
WITH net_revenue_by_product_and_client AS (
	SELECT
		product_line,
		client_type,
		SUM(total - payment_fee) AS net_revenue
	FROM sales
	GROUP BY product_line, client_type
),
net_revenue_by_product_and_client_ranked AS (
	SELECT
		*,
		RANK() OVER(PARTITION BY client_type ORDER BY net_revenue) AS rank
	FROM net_revenue_by_product_and_client
)

SELECT
	client_type,
	product_line
FROM net_revenue_by_product_and_client_ranked
WHERE rank <= 3
ORDER BY client_type, rank;