Skip to content
Superstore Capstone
Data Description
About the dataset:
row_id: unique row identifierorder_id: unique order identifierorder_date: date the order was placedship_date: date the order was shippedship_mode: how the order was shippedcustomer_id: unique customer identifiercustomer_name: customer namesegment: segment of productcountry: country of customercity: city of customerstate: state of customerpostal_code: postal code of customerregion: Superstore region representedproduct_id: unique product identifiercategory: category of productsub_category: subcategory of productproduct_name: name of productsales: total sales of that product in the orderquantity: total units sold of that product in the orderdiscount: percent discount applied for that product in the orderprofit: total profit for that product in the order
!pip install -q seaborn --upgrade
!pip install -q pandas --upgradeimport datetime as dt
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
from operator import attrgetter
import calendardf = pd.read_csv("capstone_superstore.csv")
dfData Cleaning
df.dtypesdf.drop("Unnamed: 0", axis=1, inplace=True)df["Order Date"] = pd.to_datetime(df["Order Date"])
df["Ship Date"] = pd.to_datetime(df["Ship Date"])
df["Postal Code"] = df["Postal Code"].astype(str)df.dtypesdf.isnull().sum()df[df.duplicated()]df[df["Postal Code"] == "nan"]df.loc[df["Postal Code"] == "nan", "Postal Code"] = "05049"df[df["Postal Code"] == "nan"]