Skip to content
Certificate case study project
About the company
Nearly New Nautical is a website that allows users to advertise their used boats for sale.
Business Goal
Increasing the number of readers by 75% this year.
Business Task
Understand the boats with more views :
- Are they expensive?
- What do they have in common?
About the dataset
Column Name: Details
- Price : ~ Character, boat price listed in different currencies (e.g. EUR, £, CHF etc.) on the website
- Boat Type: ~ Character, type of the boat
- Manufacturer: ~ Character, manufacturer of the boat
- Type: ~ Character, condition of the boat and engine type(e.g. Diesel, Unleaded, etc.)
- Year Built: ~ Numeric, year of the boat built
- Length: ~ Numeric, length in meter of the boat
- Width: ~ Numeric, width in meter of the boat
- Material: ~ Character, material of the boat (e.g. GRP, PVC, etc.)
- Location: ~ Character, location of the boat is listed
- Number of views last 7 days: ~Numeric, number of the views of the list last 7 days
# import libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels as sm[3]
# import dataset
df = pd.read_csv('boat_data.csv')
df.head()# check out the) data info
print(df.describe())
print(df.shape)print(df.info())Data Cleaning
[6]
# standardize the column name
df = df.rename(columns = str.lower)
df.rename(columns = {'number of views last 7 days' : 'number_of_views_last_7_days', 'year built' : 'year_built'}, inplace = True)
df.columns# removing duplicate enteries
df.drop_duplicates(inplace = True)
# check how many duplicates were removed
no_rows = df.shape
print(no_rows[0])There are no duplicates
[8]
# find missing value
df.isnull().any()There are missing values in the following columns:
- manufacturer
- type
- length
- width
- material
- location
# count the numbers of missing values
print(df.isnull().sum())