Skip to content

Viewing the Workbook

Note: click on <> View code in right margin (or on Code button at top right) to see checks on values in all fields. Click on Report button at top right to return to clean report view.

The Assignment

The Data Provided

My Analysis Work

Primary Goal: Increase market share of Hybrid vehicles sold.

Initial Problem: Hybrid vehicle sales are underperforming, and the reason is unknown. With exploratory data analysis to examine various angles on sales and models, several discoveries led to refining our business goal and problem:

Refined Goal: Increase market share of Hybrid vehicles sold by reversing the declining sales trend on specific key models, particularly the Yaris and C-HR.

Refined Problem: The company is failing to capitalize on its successful hybrid technology for all models. Specifically, data shows that the Yaris and C-HR are experiencing a concerning downward trend in Hybrid market share on newer models, despite being top-selling vehicles overall. This indicates a potential disconnect in the marketing or positioning of these specific hybrid vehicles.

Importing the Data

Spinner
DataFrameas
toyota_sales
variable
-- Explore the data in the table
SELECT *
FROM 'toyota.csv'

Data Validation

# DATA VALIDATION (1/2): Display the first few rows of the DataFrame and view some basic information.

import pandas as pd

print(toyota_sales.info())
print(toyota_sales.shape)
print(toyota_sales.isnull().sum())
toyota_sales.describe()
# DATA VALIDATION (2/2): examining unique values in the fields

# model
model_val = sorted(toyota_sales['model'].unique())
model_count = toyota_sales['model'].nunique()

# year
year_val = sorted(toyota_sales['year'].unique())
year_count = toyota_sales['year'].nunique()

# price
price_val = sorted(toyota_sales['price'].unique())
if len(price_val) > 10:
    price_val = price_val[:5] + ['...'] + price_val[-5:]
price_count = toyota_sales['price'].nunique()

# transmission
transmission_val = sorted(toyota_sales['transmission'].unique())
transmission_count = toyota_sales['transmission'].nunique()

# mileage
mileage_val = sorted(toyota_sales['mileage'].unique())
if len(mileage_val) > 10:
    mileage_val = mileage_val[:5] + ['...'] + mileage_val[-5:]
mileage_count = toyota_sales['mileage'].nunique()

# fuelType
fuelType_val = sorted(toyota_sales['fuelType'].unique())
fuelType_count = toyota_sales['fuelType'].nunique()

# tax
tax_val = sorted(toyota_sales['tax'].unique())
tax_count = toyota_sales['tax'].nunique()

# mpg
mpg_val = sorted(toyota_sales['mpg'].unique())
if len(mpg_val) > 10:
    mpg_val = mpg_val[:5] + ['...'] + mpg_val[-5:]
mpg_count = toyota_sales['mpg'].nunique()

# engineSize
engineSize_val = sorted(toyota_sales['engineSize'].unique())
engineSize_count = toyota_sales['engineSize'].nunique()

# Print all validated data variables
print("Model Values:", model_val)
print("Model Count:", model_count)
print("Year Values:", year_val)
print("Year Count:", year_count)
print("Price Values:", price_val)
print("Price Count:", price_count)
print("Transmission Values:", transmission_val)
print("Transmission Count:", transmission_count)
print("Mileage Values:", mileage_val)
print("Mileage Count:", mileage_count)
print("FuelType Values:", fuelType_val)
print("FuelType Count:", fuelType_count)
print("Tax Values:", tax_val)
print("Tax Count:", tax_count)
print("MPG Values:", mpg_val)
print("MPG Count:", mpg_count)
print("EngineSize Values:", engineSize_val)
print("EngineSize Count:", engineSize_count)

Data Validation Results Commentary

Based on the data above, the values are as stated in the table provided with THREE EXCEPTIONS / OUTLIERS:

ONE - There is a leading space on all of the model names, so this needs to be eliminated in the data set.

TWO - We are told that there are no electric vehicles, yet the minimum engine size is 0.0. That is not possible given that all non-electric vehicles will have some size engine for internal combustion. See cells below to deal with this as follows:

a. Which vehicles have zero engine size?

b. What are the engine sizes of similar vehicles?

c. Use the median engine sizes of similar vehicles to replace the zero engine sizes.

THREE - There are values = 0 in the tax column, which demands further investigation: Are these valid values, or are they data inconsistencies which should be interpolated and replaced with non-zero values?

DATA CLEANING, ONE - Clean up the model column

# DATA CLEANING, ONE - Clean up the model column
toyota_sales['model'] = toyota_sales['model'].str.lstrip()  # remove the leading spaces
model_val_clean = sorted(toyota_sales['model'].unique())
print("Model Values:", model_val_clean) # verify the spaces are gone.
print("Number of Models: ", len(model_val_clean))  # double-check the number of models

DATA CLEANING, TWO a. Which vehicles have zero engine size?

# DATA CLEANING, TWO a. Which vehicles have zero engine size?
engine_zero = toyota_sales[toyota_sales['engineSize'] == 0]
engine_zero_indices = engine_zero.index.tolist()  # save the indices for the zero engine rows to loop back to and verify replacement values
engine_zero