Skip to content
CO2 Emissions Evaluation and Bicycle Market Analysis Python
ABDELLAH ABERTENTE | Student programmer participating in the competition
💾 The data I
You have access to seven years of CO2 emissions data for Canadian vehicles (source):
- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.
The data comes from the Government of Canada's open data website.
# Import the pandas and numpy packages (pandas as pd and numpy as np)
# We use "Aliasing" to redefine python libraries with another variable to make it easier to use again.
import pandas as pd
import numpy as np
# we Load the data by using 'read_csv' pandas function
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars# Look at the first ten items in the CO2 emissions array
cars_co2_emissions[:10]💪 Challenge I
Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:
- What is the median engine size in liters?
- What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
- What is the correlation between fuel consumption and CO2 emissions?
- Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
- What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
- Any other insights you found during your analysis?
# >>>> 1)
# To calculate the median of engine size in liters, we use the function 'median()'
cars['Engine Size(L)'].median()# >>>> 2)
# In Pandas, to calculate the average of values, we use the function 'mean()'
# Calculate the average fuel consumption for regular gasoline
avg_reg_gas = cars.loc[cars['Fuel Type'] == 'X', 'Fuel Consumption Comb (L/100 km)'].mean()
# Calculate the average fuel consumption for premium gasoline
avg_prem_gas = cars.loc[cars['Fuel Type'] == 'Z', 'Fuel Consumption Comb (L/100 km)'].mean()
# Calculate the average fuel consumption for ethanol
avg_ethanol = cars.loc[cars['Fuel Type'] == 'E', 'Fuel Consumption Comb (L/100 km)'].mean()
# Calculate the average fuel consumption for diesel
avg_diesel = cars.loc[cars['Fuel Type'] == 'D', 'Fuel Consumption Comb (L/100 km)'].mean()
# Print the results
print('Average fuel consumption for regular gasoline: {:.2f} L/100 km'.format(avg_reg_gas))
print('Average fuel consumption for premium gasoline: {:.2f} L/100 km'.format(avg_prem_gas))
print('Average fuel consumption for ethanol: {:.2f} L/100 km'.format(avg_ethanol))
print('Average fuel consumption for diesel: {:.2f} L/100 km'.format(avg_diesel))# >>>> 3)
# To calculate the relationship between fuel consumption and carbon dioxide emissions,
# we use the function 'corr()', which takes a parameter inside
corr = cars['Fuel Consumption Comb (L/100 km)'].corr(cars['CO2 Emissions(g/km)']).round(2)
corr# >>>> 4)
# Calculate the mean CO2 emissions for SUV - SMALL vehicles
suv_co2 = cars.loc[cars['Vehicle Class'] == 'SUV - SMALL', 'CO2 Emissions(g/km)'].mean()
print('SUV - SMALL :', suv_co2.round(2))
# Calculate the mean CO2 emissions for MID-SIZE vehicles
midsize_co2 = cars.loc[cars['Vehicle Class'] == 'MID-SIZE', 'CO2 Emissions(g/km)'].mean()
print('MID-SIZE :', midsize_co2.round(2))
print('After looking at the two results together, I see that \'MID-SIZE\' is the lower average value')# >>>> 5)
# Calculate the mean CO2 emissions for all vehicles using the function 'mean()'
all_co2 = cars['CO2 Emissions(g/km)'].mean()
print(all_co2.round(2))
# Calculate the mean CO2 emissions for vehicles with condition of 'engine size <= 2.0' using 'mean()' function
small_engine_co2 = cars.loc[cars['Engine Size(L)'] <= 2.0, 'CO2 Emissions(g/km)'].mean()
print(small_engine_co2.round(2))Question 6 : Any other insights you found during your analysis?
# >>>> 6)
# We also have a question mark about a number of things that remain opaque in our dataset,
# like the relationship between engine size and fuel consumption or CO2 emissions.
# In the following statements, we will try to answer these questions through the code.
# Calculate the correlation between engine size and fuel consumption
corr_fc = cars['Engine Size(L)'].corr(cars['Fuel Consumption Comb (L/100 km)'])
# Calculate the correlation between engine size and CO2 emissions
corr_co2 = cars['Engine Size(L)'].corr(cars['CO2 Emissions(g/km)'])
# Print the results
print('Correlation between engine size and fuel consumption: {:.2f}'.format(corr_fc))
print('Correlation between engine size and CO2 emissions: {:.2f}'.format(corr_co2))💾 The data II
You have access to the following tables:
products
- "product_id" - Product identifier.
- "product_name" - The name of the bicycle.
- "brand_id" - You can look up the brand's name in the "brands" table.
- "category_id" - You can look up the category's name in the "categories" table.
- "model_year" - The model year of the bicycle.
- "list_price" - The price of the bicycle.
brands
- "brand_id" - Matches the identifier in the "products" table.
- "brand_name" - One of the nine brands the store sells.
categories
- "category_id" - Matches the identifier in the "products" table.
- "category_name" - One of the seven product categories in the store.
DataFrameas
df
variable
SELECT *
FROM products;DataFrameas
df
variable
SELECT * FROM brands;