💪 Challenge
Using either Tableau or Power BI, create an interactive dashboard to summarize your research. Things to consider:
- Use this Workspace to prepare your data (optional).
- Some ideas to get you started: visualize how shipments are delayed, by country, product, and over time. Analyze products by their supply versus demand ratio. Rank products by over or understock. Don't feel limited by these, you're encouraged to use your skills to consolidate as much information as possible.
- Create a screenshot of your (main) Tableau or Power BI dashboard, and paste that into the designated field.
- Summarize your findings in an executive summary.
import pandas as pd
data = pd.read_csv("data/orders_and_shipments.csv", encoding = 'UTF-8')
#Lets see the columns and how they appear
data.sample(n=25)
data.info()We have some issues with the columns names, lets take a look and fix these.
data.columns = data.columns.str.replace(" ",'')
data.columns = data.columns.str.lower()
data.columns
Lets get the date of order all together in one column and do the same with the shipment and brig them all to datetime
import datetime as dt
data = data.rename(columns={'orderyear':'year','ordermonth':'month','orderday':'day'})
data['order_date'] = pd.to_datetime(data[['year','month','day']])
data['order_date'] = data['order_date'].dt.strftime("%Y-%m-%d")
data = data.rename(columns={'shipmentyear':'Year','shipmentmonth':'Month','shipmentday':'Day'})
data['shipment_date'] = pd.to_datetime(data[['Year','Month','Day']])
data['shipment_date'] = data['shipment_date'].dt.strftime("%Y-%m-%d")
data['shipment_date'] = pd.to_datetime(data['shipment_date'])
data['order_date'] = pd.to_datetime(data['order_date'])
data.info()
data.head()Another column we have to modify the type is the discount, we have a ' - ' record in this columns that means the purchase had no discount, so lets replace it to zero (0)
data['discount%'].unique()
data['discount%'] = data['discount%'].str.replace(' - ', '0')
data['discount%'] = data['discount%'].astype('float')
# Now we'll check if there is any register anomalous
data.describe()We're going to create a column with values 0 and 1. 1 meaning that the shipment arrived whitin the scheduled time. 0 meaning that it was delayed First let's create a new column named 'days' with the values of time in days that has taken to arrive the product
# Check if there is any NaN value
data.isna().sum()
data['days'] = (data['shipment_date'] - data['order_date']).dt.days
#Now we create the check with values 0-1
data['on_time'] = 100
a = -1
for (i,j) in zip(data['days'], data['shipmentdays-scheduled']):
a = a +1
if j >= i:
data['on_time'][a] = 1
elif j < i:
data['on_time'][a] = 0
data.sample(n=50)
# In the days columns we can see that are some issues, the shipment_date cannot be sooner than the order_date. So, we should look into it to analyze better
(data.loc[data['days'] < 0].shape[0])/(data.shape[0])*100
# We have 2735 rows with this error. Or 8.86% of the total dataset.
data.loc[data['days'] < 0]['customerid'].nunique()
data.loc[data['days'] < 0]['productdepartment'].nunique()
data.loc[data['days'] < 0]['orderquantity'].nunique()
data.loc[data['days'] < 0]['shipment_date'].nunique()
data.loc[data['days'] < 0]['order_date'].nunique()
#Is not possible to see a pattern in the error to adjust, so we're going to drop these rows in order to get the most trustful info.
data_cleaned = data.loc[~data['days'] < 0]
data_cleaned.sample(n=20)Let's take a look into the inventory dataset
inventory = pd.read_csv('data/inventory.csv')
inventory.head(n=15)
inventory.shape# First of all, it is good to change the columns names to something more simple
inventory.columns = inventory.columns.str.replace(" ",'')
inventory.columns = inventory.columns.str.lower()
inventory.columns
#Change the type of columns is needed as well
inventory['yearmonth'] = inventory['yearmonth'].astype('str')
# Create a column with the date
inventory['month'] = inventory['yearmonth'].str[-2:]
inventory['year'] = inventory['yearmonth'].str[:-2]
inventory['date'] = inventory['month']+'-'+inventory['year']
inventory['date'] = pd.to_datetime(inventory['date']).dt.strftime("%Y-%m")
inventory.head(10)