Skip to content

Project - Toyota market strategies

=========================================================================================================================

Project requirements

Written Report

Your written report should include written text summaries and graphics of the following:

  1. Data validation
  • Describe validation and cleaning steps for every column in the data
Column Name Details
  • model Character, the model of the car, 18 possible values
  • year Numeric, year of registration from 1998 to 2020
  • price Numeric, listed value of the car in GBP. Assume the car also sold for this price.
  • transmission Character, one of "Manual", "Automatic", "Semi-Auto" or "Other"
  • mileage Numeric, listed mileage of the car at time of sale
  • fuelType Character, one of "Petrol", "Hybrid", "Diesel" or "Other"
  • tax Numeric, road tax in GBP. Calculated based on CO2 emissions or a fixed price depending on the age of the car.
  • mpg Numeric, miles per gallon as reported by manufacturer
  • engineSize Numeric, listed engine size, one of 16 possible values
  1. Exploratory Analysis
  • Include two different graphics showing single variables only to demonstrate the characteristics of data
  • Include at least one graphic showing two or more variables to represent the relationship between features
  • Describe your findings
  1. Definition of a metric for the business to monitor
  • How should the business use the metric to monitor the business problem
  • Can you estimate initial value(s) for the metric based on the current data
  1. Final summary including recommendations that the business should undertake

Presentation

You will be giving an overview presentation to the sales rep who requested the work. The presentation should include:

  • An overview of the project and business goals
  • A summary of the work you undertook and how this addresses the problem
  • Your key findings including the metric to monitor and current estimation
  • Your recommendations to the business

=========================================================================================================================

Summary

Interesting findings

  • Most of cars sold in last 6 month are aaround year 2017.
  • Most of cars sold in last 6 month are around £10,000.
  • In general, Hybrid/EVs car buyers show that they accept higher price range than the others.
  • Hybrid version shows better performance than other versions of the same model.

The metrics

  • To gauge the effectiveness of our sales strategies, we can closely monitor two key metrics: total sales and total sales figures. These metrics will directly show us if our efforts are leading to an increase in both the number of cars sold and the overall revenue generated compared to the next period. 30.8% of total sales and 42.4% of total sales figures are the initial values from current data.

Strategies

  • Promoting the cheaper Hybrid car: Promoting Auris and especially Yaris hybrids appears to be a highly strategic decision. Yaris hybrid has huge potential to grow comparing to its own petrol version and the largest petrol rival Aygo. Both Auris and Yaris offer a unique blend of tax benefits, performance, and fuel efficiency, making them ideal choices for environmentally conscious drivers in the tax-friendly bracket.

Recommendations

  1. Current data shows only the features of cars that been sold in the last 6 month, we need further information such as the other models that were also available but not sold in the same period to exam our conclusion with chi-square test.
  2. It would be better including the whole second hand car market to get a bigger picture of the trends between the traditional models(Petrol and Diesel) and modern models(Hybrid and EV).
  3. Constantly update the data to monitor the latest trends in the market.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg
df = pd.read_csv('toyota.csv')
df.info()
# Data Validation: check the unique values of each column
for col in df.columns:
    print(col, '\n','The unique values are ',df[col].unique(), '\n','The value counts is ', len(df[col].unique()),'\n')
# data validation for column model
df['model'] = df['model'].str.strip(' ')
df['model'] = df['model'].replace({'PROACE VERSO':'Proace Verso', 'IQ':'iQ'})
#df.loc[df['model']=='PROACE VERSO', 'model']= 'Proace Verso'
#df.loc[df['model']=='IQ', 'model']= 'iQ'
df['model'].unique()

Data Validation

Before data cleaning, the dataset has 6738 rows and 9 columns with no missing values.

  • model: 18 unique models, character format. Remove leading spaces and correct "PROACE VERSO" to "Proace Verso" and "IQ" to "iQ".
  • year: 23 years (1998-2020), numeric format. No cleaning required.
  • price: Numeric format. No cleaning required.
  • transmission: 4 categories, character format. No cleaning required.
  • mileage: Numeric format. No cleaning required.
  • fuelType: 4 categories, character format. No cleaning required.
  • tax: Numeric format. No cleaning required.
  • mpg: Numeric format. No cleaning required.
  • engineSize: 16 engine sizes, numeric format. No cleaning required.

After data cleaning, the dataset remains at 6738 rows and 9 columns with no missing values.

=========================================================================================================================

df['fuelType'].value_counts()

Exploratory Analysis and Visualization

How many cars sold in the last 6 months across 4 fuel types?

The chart shows that Petrol cars are the most popular fuel type, with 4087 sold, followed by Hybrid (2043), Diesel (503), and Others (105).

I was intrigued by how little the "Other" fuel type contributes to sales, at only around 1.5%. I examined the values for the "Other" fuel type and discovered they likely represent misclassified entries from the other three fuel types. Unfortunately, we are unable to identify the correct fuel type for these entries based on the combination of features from this data, but we drop these data.

sns.set()
plt.figure(figsize=(6,6))
sns.countplot(x='fuelType', data=df, palette='muted', order=['Petrol', 'Hybrid', 'Diesel', 'Other'])
plt.axhline(y=105, linestyle='--', label='y = 105', color='grey')
plt.legend()
plt.xlabel('Fuel Type')
plt.ylabel('Count / Cars Sold')
plt.title('Cars Sold by Fuel Type')
plt.savefig('01 Cars Sold by Fuel Type.png')
plt.show()
df = df[df['fuelType']!='Other']