Challenge 1: CO2 emissions of different vehicles
I have access to seven years of CO2 emissions data for Canadian vehicles (source):
- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.
The data comes from the Government of Canada's open data website.
1.1 exploration of the data
In this part you will be able to have an overview of the data. You will see:
- The Dtype of the columns, number of NAN,
- The number of unique elements for each categorical features,
- The analysis of numerical features,
- A pair-plot of the data.
It is important to note that I have decided to delete the duplicates. Also I changed the values of the column Fuel Type from the given letter to the coresponding type of fuel.
#hue_fuel_type is needed to keep coherence between the colours of the graphs
#hue_fuel_type = list(cars['Fuel Type'].unique())
hue_fuel_type = ['Diesel', 'Regular gasoline', 'Premium gasoline', 'Ethanol (E85)', 'Natural Gas']#hardcoded, is better like this
sns.pairplot(cars, vars=['Engine Size(L)', 'Fuel Consumption Comb (L/100 km)', 'CO2 Emissions(g/km)'], hue='Fuel Type', hue_order=hue_fuel_type)
plt.show()
1.2 Questions
This part aims to help my colleague gain insights on the type of vehicles that have lower CO2 emissions. The questions I had to answer were the following:
- What is the median engine size in liters?
- What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
- What is the correlation between fuel consumption and CO2 emissions?
- Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
- What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
- Any other insights you found during your analysis?
1.2.1 Median engine size in liters
During the exploration of the data we saw that the median engine size in liters was 3 liters. Let's now see if it changes in function of the Vehicle Class: We see that it has a clear influence on the engine size
1.2.2 What is the average fuel consumption for regular gasoline, premium gasoline , ethanol , and diesel ?
The first graph suggests a correlation between the type of fuel used and the average fuel consumption of vehicles. However, this correlation may be misleading as Ethanol (E85) could be mostly used in vehicle classes that naturally consume more fuel, such as vans and pickups. As a result, the average consumption of Ethanol may appear higher, but in reality, for the same type of vehicle, it could be similar.
To get a clearer picture, it's essential to consider the second graph, which reveals that the median fuel consumption of Ethanol is higher compared to other fuels, even for vehicle classes with low consumption.