Skip to content
0
# Importing the pandas module
import pandas as pd

# Reading in the sales data
df = pd.read_csv('data/sales_data.csv', parse_dates=['date'])

# Take a look at the first datapoints
df.head()

💾 The data

The sales data has the following fields:
  • "date" - The date, from June to August 2021.
  • "warehouse" - The company operates three warehouses: North, Central, and West.
  • "client_type" - There are two types of customers: Retail and Wholesale.
  • "product_line" - Type of products purchased.
  • "quantity" - How many items were purchased.
  • "unit_price" - Price per item sold.
  • "total" - Total sale = quantity * unit_price.
  • "payment" - How the client paid: Cash, Credit card, Transfer.
df.head()

💪 Challenge

Create a report to answer your colleague's questions. Include:

  1. What are the total sales for each payment method?
  2. What is the average unit price for each product line?
  3. Create plots to visualize findings for questions 1 and 2.
  4. [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
  5. Summarize your findings.

** SALES DATA ANALYSIS REPORT **

  • Total sales for each payment method is Cash 19199.10 Credit card 110271.57 Transfer 159642.33
    Transfer is the highest payment method used for payment followed by credit card and cash.
  • Average unit price for each product line shows highest for engine part and lowest for
    breaking system.
  • suspension & traction product shows highest purchase value.
  • wholesale client type purchase is more than retailer.
# Determine total sales for each payment method.
Total_sales = df.groupby('payment')['total'].sum()
Total_sales
# What is the average unit price for each product line?
Avg_unit_price = df.groupby('product_line')['unit_price'].mean()
Avg_unit_price
# create plot to visualize findings for Total sales for each payment method.
import matplotlib.pyplot as plt
Total_sales.plot(kind= 'barh')
plt.show()
# create plot for Average unit price for each product 
Avg_unit_price.plot(kind= 'barh')
plt.show()
# Determine average purchase value by client type
Avg_pur_value = df.groupby('client_type')['quantity'].sum()
Avg_pur_value
Avg_pur_value = df.groupby('client_type')['quantity'].sum()
Avg_pur_value.plot(kind='barh')
plt.show()
# Determine total purchase value by product_line
Total_client_pur = df.groupby('product_line')['total'].sum().sort_values(ascending = True)
Total_client_pur
import matplotlib.pyplot as plt
Total_client_pur.plot(kind='barh')
plt.show()