Skip to content
Competition - Everyone Can Learn Python Scholarship
1๏ธโฃ Python ๐ - CO2 Emissions
Now let's now move on to the competition and challenge.
๐ Background
You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.
After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.
๐พ The data I
You have access to seven years of CO2 emissions data for Canadian vehicles (source):
- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.
The data comes from the Government of Canada's open data website.
# Import the libraries
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars- first we check for missing data and other anomalies by:
- checking the shape to know data size of rows and column.
- checking the dtype of columns.
- checking for null values.
- checking if there any outliers in the data by using describe.
cars.shapecars.dtypesnull_columns = cars.columns[cars.isnull().any()]
cars[null_columns].isnull().sum()cars.describe().round(2)cars.info()- we can conclude the following:
- There are total (7385) row/samples and 9 columns/features.
- 5 of the 12 column are of object datatype, 2 integer and 2 are float type.
- We can see that there are total 4 columns with numerical values and other 5 have character values.
- The columns Engine Size(L), Cylinders, Fuel Consumption Comb (L/100 km) and CO2 Emissions(g/km) are numerical.
- The columns Make, Model, Vehicle Class, Transmission and Fuel Type are categorica.
1. What is the median engine size in liters?
carsize_median=cars['Engine Size(L)'].median()
carsize_medianfig = px.box(cars, "Engine Size(L)")
fig.update_layout(title_text="Five Number Summary For Engine Size(L)", title_x=0.5)
fig.show()- Changing the labels into a meaningful label.
โ
โ
โ
โ
โ