Skip to content
0

ABDELLAH ABERTENTE | Student programmer participating in the competition

💾 The data I

You have access to seven years of CO2 emissions data for Canadian vehicles (source):

  • "Make" - The company that manufactures the vehicle.
  • "Model" - The vehicle's model.
  • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
  • "Engine Size(L)" - The engine's displacement in liters.
  • "Cylinders" - The number of cylinders.
  • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
  • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
  • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
  • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

The data comes from the Government of Canada's open data website.

# Import the pandas and numpy packages (pandas as pd and numpy as np)
# We use "Aliasing" to redefine python libraries with another variable to make it easier to use again.
import pandas as pd
import numpy as np

# we Load the data by using 'read_csv' pandas function
cars = pd.read_csv('data/co2_emissions_canada.csv')

# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()

# Preview the dataframe
cars
# Look at the first ten items in the CO2 emissions array
cars_co2_emissions[:10]

💪 Challenge I

Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:

  1. What is the median engine size in liters?
  2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
  3. What is the correlation between fuel consumption and CO2 emissions?
  4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
  5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
  6. Any other insights you found during your analysis?
# >>>> 1)
# To calculate the median of engine size in liters, we use the function 'median()'
cars['Engine Size(L)'].median()
# >>>> 2)
# In Pandas, to calculate the average of values, we use the function 'mean()'

# Calculate the average fuel consumption for regular gasoline
avg_reg_gas = cars.loc[cars['Fuel Type'] == 'X', 'Fuel Consumption Comb (L/100 km)'].mean()

# Calculate the average fuel consumption for premium gasoline
avg_prem_gas = cars.loc[cars['Fuel Type'] == 'Z', 'Fuel Consumption Comb (L/100 km)'].mean()

# Calculate the average fuel consumption for ethanol
avg_ethanol = cars.loc[cars['Fuel Type'] == 'E', 'Fuel Consumption Comb (L/100 km)'].mean()

# Calculate the average fuel consumption for diesel
avg_diesel = cars.loc[cars['Fuel Type'] == 'D', 'Fuel Consumption Comb (L/100 km)'].mean()

# Print the results
print('Average fuel consumption for regular gasoline: {:.2f} L/100 km'.format(avg_reg_gas))
print('Average fuel consumption for premium gasoline: {:.2f} L/100 km'.format(avg_prem_gas))
print('Average fuel consumption for ethanol: {:.2f} L/100 km'.format(avg_ethanol))
print('Average fuel consumption for diesel: {:.2f} L/100 km'.format(avg_diesel))
# >>>> 3)

# To calculate the relationship between fuel consumption and carbon dioxide emissions,
# we use the function 'corr()', which takes a parameter inside
corr = cars['Fuel Consumption Comb (L/100 km)'].corr(cars['CO2 Emissions(g/km)']).round(2)
corr
# >>>> 4)

# Calculate the mean CO2 emissions for SUV - SMALL vehicles
suv_co2 = cars.loc[cars['Vehicle Class'] == 'SUV - SMALL', 'CO2 Emissions(g/km)'].mean()
print('SUV - SMALL :', suv_co2.round(2))

# Calculate the mean CO2 emissions for MID-SIZE vehicles
midsize_co2 = cars.loc[cars['Vehicle Class'] == 'MID-SIZE', 'CO2 Emissions(g/km)'].mean()
print('MID-SIZE :', midsize_co2.round(2))

print('After looking at the two results together, I see that \'MID-SIZE\' is the lower average value')
# >>>> 5)

# Calculate the mean CO2 emissions for all vehicles using the function 'mean()'
all_co2 = cars['CO2 Emissions(g/km)'].mean()
print(all_co2.round(2))

# Calculate the mean CO2 emissions for vehicles with condition of 'engine size <= 2.0' using 'mean()' function
small_engine_co2 = cars.loc[cars['Engine Size(L)'] <= 2.0, 'CO2 Emissions(g/km)'].mean()
print(small_engine_co2.round(2))

Question 6 : Any other insights you found during your analysis?

# >>>> 6)
# We also have a question mark about a number of things that remain opaque in our dataset,
# like the relationship between engine size and fuel consumption or CO2 emissions.
# In the following statements, we will try to answer these questions through the code.

# Calculate the correlation between engine size and fuel consumption
corr_fc = cars['Engine Size(L)'].corr(cars['Fuel Consumption Comb (L/100 km)'])

# Calculate the correlation between engine size and CO2 emissions
corr_co2 = cars['Engine Size(L)'].corr(cars['CO2 Emissions(g/km)'])

# Print the results
print('Correlation between engine size and fuel consumption: {:.2f}'.format(corr_fc))
print('Correlation between engine size and CO2 emissions: {:.2f}'.format(corr_co2))

💾 The data II

You have access to the following tables:

products
  • "product_id" - Product identifier.
  • "product_name" - The name of the bicycle.
  • "brand_id" - You can look up the brand's name in the "brands" table.
  • "category_id" - You can look up the category's name in the "categories" table.
  • "model_year" - The model year of the bicycle.
  • "list_price" - The price of the bicycle.
brands
  • "brand_id" - Matches the identifier in the "products" table.
  • "brand_name" - One of the nine brands the store sells.
categories
  • "category_id" - Matches the identifier in the "products" table.
  • "category_name" - One of the seven product categories in the store.
Spinner
DataFrameas
df
variable
SELECT * 
FROM products;
Spinner
DataFrameas
df
variable
SELECT * FROM brands;