Skip to content
0

💾 The data I

I have access to seven years of CO2 emissions data for Canadian vehicles (source):

  • "Make" - The company that manufactures the vehicle.
  • "Model" - The vehicle's model.
  • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
  • "Engine Size(L)" - The engine's displacement in liters.
  • "Cylinders" - The number of cylinders.
  • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
  • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
  • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
  • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

The data comes from the Government of Canada's open data website.

Importing packages

# Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Loading Data and preview

# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# Preview the dataframe
cars

Exploring Data

#Checking null values
cars.isnull().sum()
# check duplicates
cars[cars.duplicated()]
# remove duplicates
cars= cars.drop_duplicates()
cars.reset_index
cars
# Fixing strings
cars["Make"]= cars["Make"].str.upper()
cars["Model"]= cars["Model"].str.upper()
cars["Vehicle Class"]= cars["Vehicle Class"].str.upper()
cars["Transmission"]= cars["Transmission"].str.upper()
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()

Answering the questions

  • what is the median Engine size in liters?
print("The Median of Engine sizes is {}".format(round(cars_engine_sizes.mean(),2)) + " L")
  • What is the average fuel consumption for regular gasoline, premium gasoline, Ethanol, and diesel?