Project - Toyota market strategies
=========================================================================================================================
Project requirements
Written Report
Your written report should include written text summaries and graphics of the following:
-
Data validation
- Describe
validationand cleaning steps for every column in the data
Column Name Details
- model Character, the model of the car, 18 possible values
- year Numeric, year of registration from 1998 to 2020
- price Numeric, listed value of the car in GBP. Assume the car also sold for this price.
- transmission Character, one of "Manual", "Automatic", "Semi-Auto" or "Other"
- mileage Numeric, listed mileage of the car at time of sale
- fuelType Character, one of "Petrol", "Hybrid", "Diesel" or "Other"
- tax Numeric, road tax in GBP. Calculated based on CO2 emissions or a fixed price depending on the age of the car.
- mpg Numeric, miles per gallon as reported by manufacturer
- engineSize Numeric, listed engine size, one of 16 possible values
-
Exploratory Analysis
- Include
two different graphicsshowing single variables only to demonstrate the characteristics of data - Include
at least one graphicshowing two or more variables to represent the relationship between features Describe your findings
-
Definition of a metric for the business to monitor
- How should the business use the
metricto monitor the business problem - Can you
estimate initial value(s)for the metric based on the current data
-
Final summary including recommendations that the business should undertake
Presentation
You will be giving an overview presentation to the sales rep who requested the work. The presentation should include:
- An overview of the project and business goals
- A summary of the work you undertook and how this addresses the problem
- Your key findings including the metric to monitor and current estimation
- Your recommendations to the business
=========================================================================================================================
Summary
Interesting findings
- Most of cars sold in last 6 month are aaround year 2017.
- Most of cars sold in last 6 month are around
£10,000. - In general, Hybrid/EVs car buyers show that they accept
higherprice range than the others. - Hybrid version shows
better performancethan other versions of the same model.
The metrics
- To gauge the effectiveness of our sales strategies, we can closely monitor two key metrics:
total salesandtotal sales figures. These metrics will directly show us if our efforts are leading to an increase in both the number of cars sold and the overall revenue generated compared to the next period.30.8%of total sales and42.4%of total sales figures are the initial values from current data.
Strategies
- Promoting the cheaper Hybrid car: Promoting
Aurisand especiallyYarishybrids appears to be a highly strategic decision. Yaris hybrid has huge potential to grow comparing to its own petrol version and the largest petrol rival Aygo. Both Auris and Yaris offer a unique blend of tax benefits, performance, and fuel efficiency, making them ideal choices for environmentally conscious drivers in the tax-friendly bracket.
Recommendations
- Current data shows only the features of cars that been sold in the last 6 month, we need further information such as the other models that were also
available but not soldin the same period to exam our conclusion with chi-square test. - It would be better including the whole second hand car market to get a bigger picture of the trends between the traditional models(Petrol and Diesel) and modern models(Hybrid and EV).
- Constantly
updatethe data to monitor the latest trends in the market.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg
df = pd.read_csv('toyota.csv')df.info()# Data Validation: check the unique values of each column
for col in df.columns:
print(col, '\n','The unique values are ',df[col].unique(), '\n','The value counts is ', len(df[col].unique()),'\n')# data validation for column model
df['model'] = df['model'].str.strip(' ')
df['model'] = df['model'].replace({'PROACE VERSO':'Proace Verso', 'IQ':'iQ'})
#df.loc[df['model']=='PROACE VERSO', 'model']= 'Proace Verso'
#df.loc[df['model']=='IQ', 'model']= 'iQ'
df['model'].unique()Data Validation
Before data cleaning, the dataset has 6738 rows and 9 columns with no missing values.
- model:
18unique models, character format.Remove leading spacesand correct "PROACE VERSO" to"Proace Verso"and "IQ" to"iQ". - year:
23years (1998-2020), numeric format. No cleaning required. - price:
Numericformat. No cleaning required. - transmission:
4categories, character format. No cleaning required. - mileage:
Numericformat. No cleaning required. - fuelType:
4categories, character format. No cleaning required. - tax:
Numericformat. No cleaning required. - mpg:
Numericformat. No cleaning required. - engineSize:
16engine sizes, numeric format. No cleaning required.
After data cleaning, the dataset remains at 6738 rows and 9 columns with no missing values.
=========================================================================================================================
df['fuelType'].value_counts()Exploratory Analysis and Visualization
How many cars sold in the last 6 months across 4 fuel types?
The chart shows that Petrol cars are the most popular fuel type, with 4087 sold, followed by Hybrid (2043), Diesel (503), and Others (105).
I was intrigued by how little the "Other" fuel type contributes to sales, at only around 1.5%.
I examined the values for the "Other" fuel type and discovered they likely represent misclassified entries from the other three fuel types. Unfortunately, we are unable to identify the correct fuel type for these entries based on the combination of features from this data, but we drop these data.
sns.set()
plt.figure(figsize=(6,6))
sns.countplot(x='fuelType', data=df, palette='muted', order=['Petrol', 'Hybrid', 'Diesel', 'Other'])
plt.axhline(y=105, linestyle='--', label='y = 105', color='grey')
plt.legend()
plt.xlabel('Fuel Type')
plt.ylabel('Count / Cars Sold')
plt.title('Cars Sold by Fuel Type')
plt.savefig('01 Cars Sold by Fuel Type.png')
plt.show()df = df[df['fuelType']!='Other']