Skip to content
0

1๏ธโƒฃ Python ๐Ÿ - CO2 Emissions

Now let's now move on to the competition and challenge.

๐Ÿ“– Background

You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

๐Ÿ’พ The data I

You have access to seven years of CO2 emissions data for Canadian vehicles (source):

  • "Make" - The company that manufactures the vehicle.
  • "Model" - The vehicle's model.
  • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
  • "Engine Size(L)" - The engine's displacement in liters.
  • "Cylinders" - The number of cylinders.
  • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
  • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
  • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
  • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

The data comes from the Government of Canada's open data website.

# Import the libraries
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt

# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')

# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()

# Preview the dataframe
cars
  • first we check for missing data and other anomalies by:
  1. checking the shape to know data size of rows and column.
  2. checking the dtype of columns.
  3. checking for null values.
  4. checking if there any outliers in the data by using describe.
cars.shape
cars.dtypes
null_columns = cars.columns[cars.isnull().any()]
cars[null_columns].isnull().sum()
cars.describe().round(2)
cars.info()
  • we can conclude the following:
  1. There are total (7385) row/samples and 9 columns/features.
  2. 5 of the 12 column are of object datatype, 2 integer and 2 are float type.
  3. We can see that there are total 4 columns with numerical values and other 5 have character values.
  4. The columns Engine Size(L), Cylinders, Fuel Consumption Comb (L/100 km) and CO2 Emissions(g/km) are numerical.
  5. The columns Make, Model, Vehicle Class, Transmission and Fuel Type are categorica.

1. What is the median engine size in liters?

carsize_median=cars['Engine Size(L)'].median()
carsize_median
fig = px.box(cars, "Engine Size(L)")
fig.update_layout(title_text="Five Number Summary For Engine Size(L)", title_x=0.5)
fig.show()
  • Changing the labels into a meaningful label.
โ€Œ
โ€Œ
โ€Œ