Superstore Capstone

Data Description

About the dataset:

row_id: unique row identifier
order_id: unique order identifier
order_date: date the order was placed
ship_date: date the order was shipped
ship_mode: how the order was shipped
customer_id: unique customer identifier
customer_name: customer name
segment: segment of product
country: country of customer
city: city of customer
state: state of customer
postal_code: postal code of customer
region: Superstore region represented
product_id: unique product identifier
category: category of product
sub_category: subcategory of product
product_name: name of product
sales: total sales of that product in the order
quantity: total units sold of that product in the order
discount: percent discount applied for that product in the order
profit: total profit for that product in the order

!pip install -q seaborn --upgrade
!pip install -q pandas --upgrade

import datetime as dt
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt

from operator import attrgetter
import calendar

df = pd.read_csv("capstone_superstore.csv")
df

Data Cleaning

df.dtypes

df.drop("Unnamed: 0", axis=1, inplace=True)

df["Order Date"] = pd.to_datetime(df["Order Date"])
df["Ship Date"] = pd.to_datetime(df["Ship Date"])
df["Postal Code"] = df["Postal Code"].astype(str)

df.dtypes

df.isnull().sum()

df[df.duplicated()]

df[df["Postal Code"] == "nan"]

df.loc[df["Postal Code"] == "nan", "Postal Code"] = "05049"

df[df["Postal Code"] == "nan"]

‌
‌
‌

Superstore Capstone

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Data Description

Data Cleaning

Data Description