Skip to content
AN ANALYSIS OF BIG MART'S SALES DATA

BIG MART SALES DATA ANALYSIS¶

#importing the necessary libraries and reading the data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
mart = pd.read_csv('big_mart_sales.csv')
mart.tail(50)
## getting descripive statistics of the data
mart.describe()
mart.columns
## checking for missing values
mart.isna().sum()
## dealing with missing values in Item_Weight column by replacing them with the mode weight
mart['Item_Weight'] = mart['Item_Weight'].fillna(mart['Item_Weight'].mode()[0])
mart.isna().sum()
## trying to understand distribution of outlets store sizes in order to know how to deal with NaN values in the Outlet_size column
mart[['Outlet_Type','Outlet_Size','Outlet_Location_Type','Outlet_Identifier']].value_counts()

_* It seems that 'High' size outlets are the least (932), followed by 'small' size outlets(2388) and, 'medium' size outlets are the most (2793) outlets *

  • This is ideal in that we can either decide to replace NaN values in outlet size using either small or medium size stores *
## filling missing values with the most common (mode) of the outlet size variables ie medium
mart['Outlet_Size'] = mart["Outlet_Size"].fillna(mart['Outlet_Size'].value_counts().index[0])
mart.isna().sum()
table0 = pd.pivot_table(data = mart, index = 'Item_Type', aggfunc = {'Item_MRP' : np.sum})
table0.sort_values('Item_MRP')
table0.plot(kind = 'bar', title = 'MRP by item type')
plt.show()

Fruits and vegetables and snack foods have the highest maximum retail prices(MRP)

## using pivot tables to dive into the sales statistics
table1 = pd.pivot_table(data = mart, index = 'Item_Type', aggfunc = {'Item_Outlet_Sales' : np.sum})
table1.sort_values('Item_Outlet_Sales')
table1.plot(kind = 'bar', title = 'Total revenue per item category')
plt.show()

_* Fruits and vegetables followed closely by snacks are the meals making the most sales, that is, are contributing the most revenue *

  • This makes sense as these are foods that are bought more frequently and on a daily basis and also have the highest MRP of all items* _