Skip to content

About the company

Nearly New Nautical is a website that allows users to advertise their used boats for sale.

Business Goal

Increasing the number of readers by 75% this year.

Business Task

Understand the boats with more views :

  • Are they expensive?
  • What do they have in common?

About the dataset

Column Name: Details

  • Price : ~ Character, boat price listed in different currencies (e.g. EUR, £, CHF etc.) on the website
  • Boat Type: ~ Character, type of the boat
  • Manufacturer: ~ Character, manufacturer of the boat
  • Type: ~ Character, condition of the boat and engine type(e.g. Diesel, Unleaded, etc.)
  • Year Built: ~ Numeric, year of the boat built
  • Length: ~ Numeric, length in meter of the boat
  • Width: ~ Numeric, width in meter of the boat
  • Material: ~ Character, material of the boat (e.g. GRP, PVC, etc.)
  • Location: ~ Character, location of the boat is listed
  • Number of views last 7 days: ~Numeric, number of the views of the list last 7 days
# import libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels as sm
[3]
# import dataset
df = pd.read_csv('boat_data.csv')
df.head()
# check out the) data info
print(df.describe())
print(df.shape)
print(df.info())

Data Cleaning

[6]
# standardize the column name
df = df.rename(columns = str.lower)
df.rename(columns = {'number of views last 7 days' : 'number_of_views_last_7_days', 'year built' : 'year_built'}, inplace = True)
df.columns
# removing duplicate enteries
df.drop_duplicates(inplace = True)
# check how many duplicates were removed
no_rows = df.shape
print(no_rows[0])

There are no duplicates

[8]
# find missing value
df.isnull().any()

There are missing values in the following columns:

  • manufacturer
  • type
  • length
  • width
  • material
  • location
# count the numbers of missing values
print(df.isnull().sum())