Skip to content
New Workbook
Sign up
CO2 Emissions Analysis in Python and Products Analysis in SQL.
0

CO2 Emissions Analysis in Python and Products Analysis in SQL.

1️⃣ Python 🐍 - CO2 Emissions

Now let's now move on to the competition and challenge.

📖 Background

You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

💾 The data I

You have access to seven years of CO2 emissions data for Canadian vehicles (source):

  • "Make" - The company that manufactures the vehicle.
  • "Model" - The vehicle's model.
  • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
  • "Engine Size(L)" - The engine's displacement in liters.
  • "Cylinders" - The number of cylinders.
  • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
  • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
  • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
  • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

The data comes from the Government of Canada's open data website.

# Import the pandas and numpy packages
import pandas as pd
import numpy as np

# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')

# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()

# Preview the dataframe
cars

💪 Challenge I

Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:

  1. What is the median engine size in liters?
  2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
  3. What is the correlation between fuel consumption and CO2 emissions?
  4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
  5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
  6. Any other insights you found during your analysis?

First of all :

  • We will use cars.describe() to find more info about our data
display(cars.describe())
#Challenge @1
print("The median engine size = ",cars["Engine Size(L)"].median(), "L")
#Challenge @2 : Average Fuel Consumption for each fuel type
fuelX = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "X"].mean()
fuelZ = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "Z"].mean()
fuelD = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "D"].mean()
fuelE = cars["Fuel Consumption Comb (L/100 km)"][cars["Fuel Type"] == "E"].mean()
print("fuelX is : ",fuelX ,",fuelZ is: ",fuelZ ,",fuelD is :", fuelD, ",fuelE is :", fuelE)
import seaborn as sns
import matplotlib.pyplot as plt

# Create a list of the fuel names and values
fuel_names = ["fuelX","fuelE","fuelD","fuelZ"]
fuel_values = [fuelX,fuelE,fuelD,fuelZ]

# Create the bar plot
sns.set_style("whitegrid")
ax = sns.barplot(x=fuel_names, y=fuel_values, color='navy', alpha = 0.8)
for i, v in enumerate(fuel_values):
    ax.text(i, v + 1, str(v), color='black', ha='center')

# Show the plot
plt.title("Average Fuel Consumption for each fuel type", y = 1.1)
plt.xlabel("Fuel Type")
plt.ylabel("av_Fuel_cons. in (litres)")
plt.show()
#Challenge @3
#Here we plot both of the two columns so that we can see the relation between them visually!!
sns.scatterplot(x = cars["Fuel Consumption Comb (L/100 km)"],y = cars["CO2 Emissions(g/km)"], marker = 'x')
plt.title("The correlation between fuel consumption and CO2 emissions.");
  • We can also assure our idea with the correlation coeffecient which is a value between -1 and 1 that indicates the strength and direction of the linear relationship between the two columns:
cars["Fuel Consumption Comb (L/100 km)"].corr(cars["CO2 Emissions(g/km)"])
  • It's obviously a Strong Positibe Relation with about (0.92) between Fuel comnsumption and CO2 Emissions, thus the more we use fuel ,the more CO2 Emissions there are.
  • It's clear that we have three or four clusters up there in the chart.They might be because we have 4 types of fuel!
#Challenge @4:Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?

suv_small = cars[(cars["Vehicle Class"]=="SUV - SMALL")]
mid_size = cars[(cars["Vehicle Class"]=="MID-SIZE")]

suv_small_avg = suv_small["CO2 Emissions(g/km)"].mean()
mid_size_avg = mid_size["CO2 Emissions(g/km)"].mean()

print("The SUV - SMALL average CO2 emissions : ",suv_small_avg)
print("The MID-SIZE average CO2 emissions : ", mid_size_avg)