Skip to content
Spinner
DataFrameas
df
variable
SELECT * 
FROM 'toyota.csv'
ORDER BY year;
Hidden output
Spinner
DataFrameas
df2
variable
SELECT *
FROM 'toyota.csv'
WHERE transmission IS NOT NULL
  AND column1 IS NOT NULL
  AND column2 IS NOT NULL
  AND column3 IS NOT NULL
  -- Add additional columns as needed
;
Hidden output
Spinner
DataFrameas
df1
variable
SELECT 
    COUNT(DISTINCT model) AS model_value_count,
    COUNT(DISTINCT year) AS year_value_count,
    COUNT(DISTINCT price) AS price_value_count,
	COUNT(DISTINCT transmission) AS transmission_value_count,
    COUNT(DISTINCT mileage) AS mileage_value_count,
	COUNT(DISTINCT tax) AS tax_value_count,
	COUNT(DISTINCT mpg) AS mpg_value_count,
	COUNT(DISTINCT engineSize) AS engineSize_value_count
FROM 'toyota.csv';
Hidden output

Data Validation

The dataset contains 6738 rows and 9 columns before cleaning. I have validated all the columns against the criteria in the dataset table:

  • model: 18 model without missing values same as the description. No cleaning is needed.
  • year: 23 unique values without missing values, from 1998 to 2020 same as the description. No cleaning is needed.
  • price: numeric values without missing values, same as the description. No cleaning needed.
  • transmission: 4 category values without missing values, same as description. No cleaning needed.
  • mileage: numeric values, same as the description. No cleaning needed.
  • fuelType: 4 category values without missing values, same as description. No cleaning needed.
  • tax: numeric values, same as the description. No cleaning needed.
  • mpg: numeric values, same as the description. No cleaning needed.
  • engineSize: 16 unique values without missing value, same as description. No cleaning is needed. After the validation, the dataset contains 6738 rows and 9 columns without missing values.
Spinner
DataFrameas
df3
variable
SELECT 
    fuelType, 
    COUNT(price) AS total_sales
FROM 'toyota.csv'
 WHERE year = '2020'
GROUP BY fuelType
ORDER BY total_sales DESC
Hidden output
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
toyota = pd.read_csv('toyota.csv')

# Filter the data for the last six months

# Group by fuelType and calculate total sales
fuel_sales = toyota.groupby('fuelType')['price'].count().reset_index(name='count')
fuel_sales = fuel_sales.sort_values(by='count', ascending=False)

print(fuel_sales)
plt.figure(figsize=(8, 5))
plt.bar(fuel_sales['fuelType'], fuel_sales['count'], zorder=3)
plt.grid(axis='y', color='grey', linestyle='--', linewidth=0.5, zorder=0)
plt.xlabel('Fuel Type')
plt.ylabel('Total Sales')
plt.title('Total Sales by Fuel Type')
plt.show()
Hidden output
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
toyota = pd.read_csv('toyota.csv')

# Filter the dataset for Hybrid fuel type and select the 'model' column
hybrid_model = toyota[toyota['fuelType'] == 'Hybrid'][['model']]
hybrid_model_group = hybrid_model.groupby('model')['model'].count().reset_index(name='count').sort_values(by='count', ascending=False)

# Display the grouped data
print(hybrid_model_group)

plt.figure(figsize=(8, 5))
plt.bar(hybrid_model_group['model'], hybrid_model_group['count'], zorder=3)
plt.grid(axis='y', color='grey', linestyle='--', linewidth=0.5, zorder=0)
plt.xlabel('Car Model')
plt.ylabel('Total Car Model Sold')
plt.title('Total Sales by Car Model')
plt.show()
Hidden output
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

toyota = pd.read_csv('toyota.csv')

plt.figure(figsize=(8, 5))
plt.hist(toyota['price'], bins=70, edgecolor='black', zorder=3)
plt.grid(axis='y', color='grey', linestyle='--', linewidth=0.5, zorder=0)
plt.xlabel('Car Price')
plt.ylabel('Number of Cars Sold')
plt.title('Distribution of Car Price')
plt.show()
Hidden output
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

toyota = pd.read_csv('toyota.csv')

plt.figure(figsize=(10, 6))
sns.boxplot(x='fuelType', y='price', data=toyota, zorder=3)
plt.grid(axis='y', color='grey', linestyle='--', linewidth=0.5, zorder=0)
plt.xlabel('Fuel Type')
plt.ylabel('Price')
plt.title('Distribution of Prices by Fuel Type')
plt.show()
Hidden output

How the sales differ between hybrid and other fuel types from sales over the last six months.

Over the last 6 months, almost half of the used cars we sold are using petrol, followed by hybrid, diesel then other. Also, we can see that the number of cars with petrol sold is almost twice the number of cars with hybrid. Therefore, is huge room to improve the sales of hybrid cars.

The chart below shows that the highest model of hybrid car sold is Yaris, followed by Aurin, and in 3rd position C-HR.

Looking at the distribution of prices for the past 6 months, the graph suggests that most cars sold are priced between 0 and 20000. As the price increases, the number of cars sold decreases, following a right-skewed pattern. The peak of the distribution is around 10000, suggesting that this price range is the most popular among buyers.

The median price for Petrol is the lowest, followed by Other, Diesel, and Hybrid indicating that petrol cars are more affordable, followed by other, and so on. The box for Hybrid has the largest range, indicating a wider spread of prices between 12000 to 21000. There are a few outliers, particularly for the Other and Diesel categories, suggesting some cars in these categories have significantly higher prices compared to the majority. The distribution of prices for Petrol is skewed to the right, meaning there are some higher-priced Petrol cars. The distributions for Other, Hybrid, and Diesel appear more symmetrical. Overall, the box plot shows that while the median prices for Hybrid and Diesel are similar, the range of prices for Diesel is wider, indicating more variation in pricing. Petrol cars generally have lower prices, with a few outliers suggesting some high-priced models

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

toyota = pd.read_csv('toyota.csv')
toyota_Hybrid = toyota[toyota['fuelType'] == 'Hybrid'][['price', 'model']]

plt.figure(figsize=(10, 6))
sns.boxplot(x='model', y='price', data=toyota_Hybrid, zorder=3)
plt.grid(axis='y', color='grey', linestyle='--', linewidth=0.5, zorder=0)
plt.xlabel('Hybrid Car Model')
plt.ylabel('Price')
plt.title('Distribution of Prices by Car Model (Hybrid)')
plt.show()
Hidden output
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

toyota = pd.read_csv('toyota.csv')

tax_grp_count = toyota.groupby(['tax', 'fuelType']).size().reset_index(name='count')

sns.barplot(data=tax_grp_count, x='tax', y='count', hue='fuelType', dodge=False, alpha=0.5, palette='Set1', zorder=3)
plt.grid(axis='y', color='grey', linestyle='--', linewidth=0.5, zorder=0)
plt.xlabel('Tax')
plt.ylabel('Count / Number of Cars Sold')
plt.title('Distribution of Cars by Tax and Fuel Type')
plt.xticks(rotation=90)
plt.show()
Hidden output

How to make the Hybrid more appealing

Given that hybrid cars are more expensive, and the Yaris, Auris, and Prius models have consistently sold below 10000 GBP in the past six months, a targeted campaign focused on these models could be particularly effective. By emphasizing their price competitiveness compared to petrol cars, we can attract a broader audience who are concerned about affordability.

To make the Hybrid cars more attractive, we can leverage the advantage of the Hybrid cars which has the most common road tax as 0 GBP as shown in the bar chart below compare to petrol cars with the most common road tax of 145 GBP. Therefore, for people buying traditional fuel type cars, this can be ab excellent choice base on this information.

Business Metrics

To achieve our goal of increasing the sales of hybrid and electric cars next year, we will use the percentage of hybrid and electric cars sold in the last six months as our key metric.

Currently, approximately 30% of the cars sold are hybrid and electric, based on our data from the past six months. By increasing this percentage in the next six months, we will be on track to meet our sales targets.

Recommendations

For the following weeks, I will recommend we focus on the following steps;

  • Use key metrics to monitored whether there is a strong sign if the sales will increase or not.
  • To promote the campaign as soon as possible, we should smartly promote the hybrid models in the market based on:
    • Road tax reduction`
    • Less expensive models i.e. Prius, Yaris and Auris model
  • Stay on top of the current hybrid and electric cars market so that newest information can be integrated in the campaign.
  • Data collection for in-depth analysis
    • Improve data quality - what the other fuel type means?
    • New related data - Using different fuel types in the same car model to compare eco-friendly cars and other fuel types