Reporting on sales data
Now let's now move on to the competition and challenge.
📖 Background
You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.
You’ve recently learned data manipulation and plotting, and suggest helping your colleague analyze past sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line.
💾 The data
The sales data has the following fields:
- "date" - The date, from June to August 2021.
- "warehouse" - The company operates three warehouses: North, Central, and West.
- "client_type" - There are two types of customers: Retail and Wholesale.
- "product_line" - Type of products purchased.
- "quantity" - How many items were purchased.
- "unit_price" - Price per item sold.
- "total" - Total sale = quantity * unit_price.
- "payment" - How the client paid: Cash, Credit card, Transfer.
💪 Challenge
Create a report to answer your colleague's questions. Include:
- What are the total sales for each payment method?
- What is the average unit price for each product line?
- Create plots to visualize findings for questions 1 and 2.
- [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
- Summarize your findings.
Importing and exploring the data
Fistly, pandas is imported and the data-sales is retrived from csv file. Before we conduct the analysis,let us examine the nature of the data.
# import data
import pandas as pd
sales= pd.read_csv("data/sales_data.csv")
sales.info()
sales.isna().sum()
sales.describe()
As we see from top, there is no missing value in the sales data.In addition,we get some statistical overview for the data to understand its nature more.Now we can start answering our colleague's questions.
Analizing total sales for each payment method
In order to get total sales for payment methods, we use "groupby" to group payment column then by "agg" we find total sales for each.To graph dataframe,we import pyplot.
#total sales for each payment method
sales_for_each_p= sales.groupby("payment").agg({"total":"sum"})
print(sales_for_each_p)
#plot total sales for each payment method
from matplotlib import pyplot as plt
sales_for_each_p.plot(kind="barh",title=("Total sales for payment methods"),width=0.8,
color='g')
plt.show()Figure shows that,clients mostly prefer transfer as payment method.They are unlikely to use cash as a payment method. Is this client preferance change by the warehouse types? To answer this question we can make seperate graphs for warehouse types: central,north and west.Graph shows prefences are not differentiate.However, the gap between credit cards payers and transfer payers are very little in west warehouse.
sales_t= pd.read_csv("data/sales_data.csv",parse_dates=["date"],index_col="date")
sales_c =sales_t[sales_t["warehouse"]=="Central"]
sales_n =sales_t[sales_t["warehouse"]=="North"]
sales_w =sales_t[sales_t["warehouse"]=="West"]
#payment types for warehouse central
sales_for_each_pay_central= sales_c.groupby("payment").agg({"total":"sum"})
print(sales_for_each_pay_central)
#payment types for warehouse north
sales_for_each_pay_north= sales_n.groupby("payment").agg({"total":"sum"})
print(sales_for_each_pay_north)
#payment types for warehouse west
sales_for_each_pay_west= sales_w.groupby("payment").agg({"total":"sum"})
print(sales_for_each_pay_west)
#subplots for each
fig,ax=plt.subplots(3,1,sharey=True,sharex=True,figsize=(8,8))
ax[0].bar(sales_for_each_pay_central.index,sales_for_each_pay_central["total"],label="Central")
ax[1].bar(sales_for_each_pay_north.index,sales_for_each_pay_north["total"],label="North")
ax[2].bar(sales_for_each_pay_west.index,sales_for_each_pay_west["total"],label="West")
ax[0].set_ylabel("Central")
ax[1].set_ylabel("North")
ax[2].set_ylabel("West")
ax[0].set_title("Total sales on Central warehouse by different methods ",fontsize=10)
ax[1].set_title("Total sales on North warehouse by different methods ",fontsize=10)
ax[2].set_title("Total sales on West warehouse by different methods ",fontsize=10)
plt.show()
Analizing average unit prices for each product line
In order to get unit prices for each product line,once again, we use "groupby" to group product line column then by "agg" we find mean unit prices for each.Then,we sort values in ascending order.
#average unit price for each product line
average_uprice_prod= sales.groupby("product_line").agg({"unit_price":"mean"})
s_avg=average_uprice_prod.sort_values("unit_price",ascending=True)
print(s_avg)
After plotting,we can observe that average of the engine unit price(60.09) is the highest while breaking system unit price(17.74) is the lowest.