Skip to content

Motoparts

Motoparts is an imaginary company that sells motorcycle parts, and they've asked for some help in analysing their sales data!

They operate three warehouses in Australia, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.

The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. I have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale" orders are included.

They have provided me with access to their database, which contains the following table called sales:

Schema

ColumnData typeDescription
order_numberVARCHARUnique order number.
dateDATEDate of the order, from June to August 2021.
warehouseVARCHARThe warehouse that the order was made from— North, Central, or West.
client_typeVARCHARWhether the order was Retail or Wholesale.
product_lineVARCHARType of product ordered.
quantityINTNumber of products ordered.
unit_priceFLOATPrice per product (dollars).
totalFLOATTotal price of the order (dollars).
paymentVARCHARPayment method—Credit card, Transfer, or Cash.
payment_feeFLOATPercentage of total charged as a result of the payment method.

Requirements

They have requested the query output be presented in the following format:

product_linemonthwarehousenet_revenue
product_one---------
product_one---------
product_one---------
product_one---------
product_one---------
product_one---------
product_two---------
............

Solution

Let's get started.

After assessing the schema for the sales table, and considering the expected output for this query, it is clear that we are only interested in the following columns:

  • product_line (unchanged)
  • date (converted to mmmm) AS month
  • warehouse (unchanged)
  • total & payment fee (SUM(total) - SUM(payment_fee)) AS net_revenue

We are only interested in records where:

  • client_type is "Wholesale"

The output then needs to be ordered by product_line, month and net_revenue.

Understanding the Data

Lets start simple and run an ordered select all

Spinner
DataFrameas
all
variable
SELECT * FROM public.sales

ORDER BY product_line, date, total DESC

Interesting...

Some observations on the selected data:

  • There are exactly 1000 rows.
  • order_number appears to be comprised of the first letter of warehouse followed by a unique id number.
  • date appears to be stored as datetime
  • The "Credit card" payment method incurs a 3c fee, "Transfer" incurs 1c and "Cash" incurs no fee.

Cleaning

lets cull the records and columns we don't need and work on our custom columns, month and net_revenue, before we start grouping.

Spinner
DataFrameas
cleaned
variable
SELECT
product_line,
to_char(date, 'Month') AS month,
warehouse,
total - payment_fee AS net_revenue

FROM public.sales

WHERE client_type = 'Wholesale'

ORDER BY product_line, "month", "net_revenue" DESC

Great!

We have the desired records and columns using the requested schema, ordered in the correct sequence.

Final Output: Grouping & Aggregating

Time to GROUP BY and aggregate our net_revenue.

Our output should then be ready to submit to the board of directors.

Spinner
DataFrameas
revenue_by_product_line
variable
SELECT
product_line,
to_char(date, 'Month') AS month,
warehouse,
SUM(total) - SUM(payment_fee) AS net_revenue

FROM public.sales

WHERE client_type = 'Wholesale'

GROUP BY product_line, "month", warehouse
ORDER BY product_line, "month", "net_revenue" DESC

Concluding Notes

We now have an indication of the wholesale product line performance by month and warehouse in the requested format, however I have some concerns regarding the readability and usefulness of this output.

It is quite difficult to compare the performance of product lines against eachother using a common dimension such as month or warehouse.

For example; has a particular product line improved or weakened over time? Are there product lines that excel in some warehouses but not others?

Having understood the objectives of the board of directors, it may be worth going above and beyond their original request to visualise wholesale product line performance. These visualisations will provide greater context and supplement their strategic analysis.