Skip to content
1️⃣ Python 🐍 - CO2 Emissions
Now let's now move on to the competition and challenge.
📖 Background
You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.
After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars
# finding the median engine size in liters
median_engine_size = np.median(cars_engine_sizes)
print(f'The median engine size is {median_engine_size}')
# finding the average fuel consumption for each fuel type
fuel_types = { # create a dictionary with the code of the fuel type as keys
'X': 'Regular gasoline', # and the fuel type name as values
'Z': 'Premium gasoline',
'E': 'Ethanol',
'D': 'Diesel'
}
print('The average fuel consumption for each fuel type:\n')
for k, v in fuel_types.items(): # iterate over each key and value in the dictionary
mask = cars_fuel_types == k # create a mask array for each key in the dictionary
subset = cars_fuel_consumption[mask] # subset the fuel consumption array using the mask
avg_fuel_consumption = np.mean(subset) # average each subset (fuel type) for fuel consumption
print(f'{v}: {avg_fuel_consumption:.2f} L/100 km') # print the result for each fuel type
# finding the correlation between fuel consumption and CO2 emissions
corr = np.corrcoef(cars_fuel_consumption, cars_co2_emissions)
print(f'The correlation coefficient between fuel consumption and CO2 emissions is: {corr[0, 1]:.2f}')
# finding the vehicle class with lower CO2 emissions
def lowest_co2(classes):
"""
The function finds the vehicle class with the lowest CO2 emissions from a list of classes.
"""
cars_classes = cars['Vehicle Class'].to_numpy() # insert the variables for classes and
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy() # emissions inside the function
co2_emissions = [] # create an empty list for average co2 emissions per car class
for v_class in classes: # iterate over vehicle classes in the list fed to the function
mask = cars_classes == v_class # create a mask array for each class
subset = cars_co2_emissions[mask] # subset emissions using the mask
avg_co2_emission = np.mean(subset) # averaging co2 emissions for each subset
co2_emissions.append(avg_co2_emission) # appending the average co2 emissions to the empty list
min_co2_emission = min(co2_emissions) # finding the minimum average emission in the list
i = co2_emissions.index(min_co2_emission) # finding the index of the minimum average emission
low_co2_class = classes[i] # subsetting the list of classes using the minimum emission
return low_co2_class, min_co2_emission # return a tuple of minimum emission class and value
vehicle_classes = ['SUV - SMALL', 'MID-SIZE']
print(f'{lowest_co2(vehicle_classes)[0]} has lower CO2 emissions on average with {lowest_co2(vehicle_classes)[1]:.2f} g/km.')
# finding the average CO2 emissions for all vehicles
avg_co2_emissions = np.mean(cars_co2_emissions)
print(f'The average CO2 emissions for all vehicles is {avg_co2_emissions:.2f} g/km.')
# finding the average CO2 emissions for vehicles with engine size of 2.0 liters or smaller
mask = cars_engine_sizes <= 2.0
subset = cars_co2_emissions[mask]
avg_emissions_small = np.mean(subset)
print(f'The average CO2 emissions for vehicles with small engine size (2 liters or smaller) is {avg_emissions_small:.2f} g/km.')
During our analysis we found that:
- Diesel fuel has the lowest average fuel consumption among fuel types while Ethanol has the highest average consumption.
- There is a very strong correlation between fuel consumption and CO2 emissions.
- Smaller engines have lower CO2 emissoins on average.
2️⃣ SQL - Understanding the bicycle market
📖 Background
You work for a chain of bicycle stores. Your new team leader comes from a different industry and wants your help learning about the bicycle market. Specifically, they need to understand better the brands and categories for sale at your stores.
Unknown integration
DataFrameavailable as
df
variable
SELECT TOP 1 product_name
FROM products
ORDER BY list_price DESC;
Unknown integration
DataFrameavailable as
df
variable
SELECT TOP 1 product_name
FROM products
ORDER BY list_price;
Unknown integration
DataFrameavailable as
df
variable
SELECT
category_name AS category,
COUNT(DISTINCT product_id) AS n_unique_products
FROM products AS p
INNER JOIN categories AS c
ON p.category_id = c.category_id
GROUP BY category_name
ORDER BY n_unique_products DESC;
Unknown integration
DataFrameavailable as
df
variable
SELECT TOP 3
brand_name,
AVG(list_price) AS avg_price
FROM products AS p
INNER JOIN brands AS b
ON p.brand_id = b.brand_id
GROUP BY brand_name
ORDER BY avg_price DESC;
Unknown integration
DataFrameavailable as
df
variable
SELECT TOP 3
category_name,
AVG(list_price) AS avg_price
FROM products AS p
INNER JOIN categories AS c
ON p.category_id = c.category_id
GROUP BY category_name
ORDER BY avg_price DESC;
During our analysis we found the following:
- The most expensive bike is Trek Domane SLR 9 Disc - 2018.
- The least expensive bike is Strider Classic 12 Balance Bike - 2018.
- The highest number of unique bikes fall into Cruisers Bicycles category with 78 unique products while Mountain and Road bikes are tied for second place with 60 unique products.
- The lowest number of unique bikes fall into Cyclocross Bicycles category with 10 unique products.
- The 3 most expensive bike brands are Trek, Heller and Surly.
- The 3 most expensive bike categories are Electric, Road, and Cyclocross.