Skip to content
0

📖 Background

You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

You’ve recently learned data manipulation and plotting, and suggest helping your colleague analyze past sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line.

💾 The data

The team assembled the following file:

The sales data has the following fields:
  • "date" - The date, from June to August 2021.
  • "warehouse" - The company operates three warehouses: North, Central, and West.
  • "client_type" - There are two types of customers: Retail and Wholesale.
  • "product_line" - Type of products purchased.
  • "quantity" - How many items were purchased.
  • "unit_price" - Price per item sold.
  • "total" - Total sale = quantity * unit_price.
  • "payment" - How the client paid: Cash, Credit card, Transfer.
library(tidyverse)
library(knitr)

df <- readr::read_csv('./data/sales_data.csv')
kable(head(df))

💪 Challenge

Create a report to answer your colleague's questions. Include:

1. What are the total sales for each payment method?

library(knitr)
kable(df %>%
  group_by(payment) %>%
  summarize(per_method_total_sales=sum(total)) )

  df %>%
  group_by(payment) %>%
  summarize(per_method_total_sales=sum(total)) %>%
  ggplot(aes(payment,per_method_total_sales)) +
  geom_col() + theme_minimal() + xlab("Payment method") + ylab("Total sales") + 
    scale_y_continuous(breaks = seq(0,160000,25000),limits = c(0,160000))
  

2. What is the average unit price for each product line?

 df %>%
  group_by(product_line) %>%
  summarize(avg=mean(unit_price)) %>%
  ggplot(aes(product_line,avg,label=round(avg,0))) +
  geom_point() + theme_minimal() + xlab("Product line") + ylab("Average unit price") + ylim(c(0,65)) + 
  geom_label()

3. [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)

# average purchase value by client type
kable(df %>%
  group_by(client_type) %>%
  summarize(avg=mean(total)))

# total purchase value by product line
kable(df %>%
  group_by(product_line) %>%
  summarize(sum=sum(total)))

4. Summarize your findings.

  1. The customers prefer the transfer method for payment,
  2. On average, the products of the line "Engine" are more expensive than the others while the products of the line "Breaking system" are cheaper,
  3. On average, wholesale sales perform better than retail
  4. The products of the line "Suspension & traction" leads the board of the total purchase value

✅ Checklist before publishing into the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.Rmd.
  • Remove redundant cells like the introduction to R Markdown notebooks, so the workbook is focused on your story
  • Check that all the cells run without error.

⌛️ Time is ticking. Good luck!