Skip to content
Superstore Capstone
Data Description
About the dataset:
row_id
: unique row identifierorder_id
: unique order identifierorder_date
: date the order was placedship_date
: date the order was shippedship_mode
: how the order was shippedcustomer_id
: unique customer identifiercustomer_name
: customer namesegment
: segment of productcountry
: country of customercity
: city of customerstate
: state of customerpostal_code
: postal code of customerregion
: Superstore region representedproduct_id
: unique product identifiercategory
: category of productsub_category
: subcategory of productproduct_name
: name of productsales
: total sales of that product in the orderquantity
: total units sold of that product in the orderdiscount
: percent discount applied for that product in the orderprofit
: total profit for that product in the order
!pip install -q seaborn --upgrade
!pip install -q pandas --upgrade
import datetime as dt
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
from operator import attrgetter
import calendar
df = pd.read_csv("capstone_superstore.csv")
df
Data Cleaning
df.dtypes
df.drop("Unnamed: 0", axis=1, inplace=True)
df["Order Date"] = pd.to_datetime(df["Order Date"])
df["Ship Date"] = pd.to_datetime(df["Ship Date"])
df["Postal Code"] = df["Postal Code"].astype(str)
df.dtypes
df.isnull().sum()
df[df.duplicated()]
df[df["Postal Code"] == "nan"]
df.loc[df["Postal Code"] == "nan", "Postal Code"] = "05049"
df[df["Postal Code"] == "nan"]