Skip to content
CO2 Emissions Analysis in Python and Products Analysis in SQL.
CO2 Emissions Analysis in Python and Products Analysis in SQL.
1️⃣ Python 🐍 - CO2 Emissions
Now let's now move on to the competition and challenge.
📖 Background
You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.
After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.
💾 The data I
You have access to seven years of CO2 emissions data for Canadian vehicles (source):
- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.
The data comes from the Government of Canada's open data website.
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars
💪 Challenge I
Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:
- What is the median engine size in liters?
- What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
- What is the correlation between fuel consumption and CO2 emissions?
- Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
- What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
- Any other insights you found during your analysis?
First of all :
- We will use
cars.describe()
to find more info about our data
display(cars.describe())
#Challenge @1
print("The median engine size = ",cars["Engine Size(L)"].median(), "L")
#Challenge @2 : Average Fuel Consumption for each fuel type
fuelX = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "X"].mean()
fuelZ = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "Z"].mean()
fuelD = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "D"].mean()
fuelE = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "E"].mean()
print("fuelX is : ",fuelX ,",fuelZ is: ",fuelZ ,",fuelD is :", fuelD, ",fuelE is :", fuelE)
import seaborn as sns
import matplotlib.pyplot as plt
# Create a list of the fuel names and values
fuel_names = ["fuelX","fuelE","fuelD","fuelZ"]
fuel_values = [fuelX,fuelE,fuelD,fuelZ]
# Create the bar plot
sns.set_style("whitegrid")
ax = sns.barplot(x=fuel_names, y=fuel_values, color='navy', alpha = 0.8)
for i, v in enumerate(fuel_values):
ax.text(i, v + 1, str(v), color='black', ha='center')
# Show the plot
plt.title("Average Fuel Consumption for each fuel type", y = 1.1)
plt.xlabel("Fuel Type")
plt.ylabel("av_Fuel_cons. in (litres)")
plt.show()
#Challenge @3
#Here we plot both of the two columns so that we can see the relation between them visually!!
sns.scatterplot(x = cars["Fuel Consumption Comb (L/100 km)"],y = cars["CO2 Emissions(g/km)"], marker = 'x')
plt.title("The correlation between fuel consumption and CO2 emissions.");
- We can also assure our idea with the correlation coeffecient which is a value between -1 and 1 that indicates the strength and direction of the linear relationship between the two columns:
cars["Fuel Consumption Comb (L/100 km)"].corr(cars["CO2 Emissions(g/km)"])
- It's obviously a Strong Positibe Relation with about (0.92) between Fuel comnsumption and CO2 Emissions, thus the more we use fuel ,the more CO2 Emissions there are.
- It's clear that we have three or four clusters up there in the chart.They might be because we have 4 types of fuel!
#Challenge @4:Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
suv_small = cars[(cars["Vehicle Class"]=="SUV - SMALL")]
mid_size = cars[(cars["Vehicle Class"]=="MID-SIZE")]
suv_small_avg = suv_small["CO2 Emissions(g/km)"].mean()
mid_size_avg = mid_size["CO2 Emissions(g/km)"].mean()
print("The SUV - SMALL average CO2 emissions : ",suv_small_avg)
print("The MID-SIZE average CO2 emissions : ", mid_size_avg)