Skip to content
Competition - Everyone Can Learn Python Scholarship
0
  • AI Chat
  • Code
  • Report
  • 1️⃣ Python 🐍 - CO2 Emissions

    Now let's now move on to the competition and challenge.

    📖 Background

    You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

    After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

    # Import the pandas and numpy packages
    import pandas as pd
    import numpy as np
    
    # Load the data
    cars = pd.read_csv('data/co2_emissions_canada.csv')
    
    # create numpy arrays
    cars_makes = cars['Make'].to_numpy()
    cars_models = cars['Model'].to_numpy()
    cars_classes = cars['Vehicle Class'].to_numpy()
    cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
    cars_cylinders = cars['Cylinders'].to_numpy()
    cars_transmissions = cars['Transmission'].to_numpy()
    cars_fuel_types = cars['Fuel Type'].to_numpy()
    cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
    cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
    
    # Preview the dataframe
    cars
    # finding the median engine size in liters
    
    median_engine_size = np.median(cars_engine_sizes)
    print(f'The median engine size is {median_engine_size}')
    # finding the average fuel consumption for each fuel type
    
    fuel_types = {                                     # create a dictionary with the code of the fuel type as keys
        'X': 'Regular gasoline',                       # and the fuel type name as values
        'Z': 'Premium gasoline',
        'E': 'Ethanol',
        'D': 'Diesel'
    }
    
    print('The average fuel consumption for each fuel type:\n')
    
    for k, v in fuel_types.items():                             # iterate over each key and value in the dictionary
        mask = cars_fuel_types == k                            # create a mask array for each key in the dictionary
        subset = cars_fuel_consumption[mask]                     # subset the fuel consumption array using the mask
        avg_fuel_consumption = np.mean(subset)               # average each subset (fuel type) for fuel consumption 
        print(f'{v}: {avg_fuel_consumption:.2f} L/100 km')                    # print the result for each fuel type
    # finding the correlation between fuel consumption and CO2 emissions
    
    corr = np.corrcoef(cars_fuel_consumption, cars_co2_emissions)
    print(f'The correlation coefficient between fuel consumption and CO2 emissions is: {corr[0, 1]:.2f}')
    # finding the vehicle class with lower CO2 emissions
    
    def lowest_co2(classes):
        """
        The function finds the vehicle class with the lowest CO2 emissions from a list of classes.
        """
        cars_classes = cars['Vehicle Class'].to_numpy()                     # insert the variables for classes and
        cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()         # emissions inside the function
        
        co2_emissions = []                           # create an empty list for average co2 emissions per car class 
        
        for v_class in classes:                      # iterate over vehicle classes in the list fed to the function
            mask = cars_classes == v_class                                     # create a mask array for each class
            subset = cars_co2_emissions[mask]                                     # subset emissions using the mask
            avg_co2_emission = np.mean(subset)                            # averaging co2 emissions for each subset
            co2_emissions.append(avg_co2_emission)          # appending the average co2 emissions to the empty list
        
        min_co2_emission = min(co2_emissions)                    # finding the minimum average emission in the list
        i = co2_emissions.index(min_co2_emission)               # finding the index of the minimum average emission
        low_co2_class = classes[i]                      # subsetting the list of classes using the minimum emission
        
        return low_co2_class, min_co2_emission                 # return a tuple of minimum emission class and value
    
    
    vehicle_classes = ['SUV - SMALL', 'MID-SIZE']
    print(f'{lowest_co2(vehicle_classes)[0]} has lower CO2 emissions on average with {lowest_co2(vehicle_classes)[1]:.2f} g/km.')
    
    # finding the average CO2 emissions for all vehicles
    
    avg_co2_emissions = np.mean(cars_co2_emissions)
    print(f'The average CO2 emissions for all vehicles is {avg_co2_emissions:.2f} g/km.')
    
    # finding the average CO2 emissions for vehicles with engine size of 2.0 liters or smaller
    
    mask = cars_engine_sizes <= 2.0
    subset = cars_co2_emissions[mask]
    avg_emissions_small = np.mean(subset)
    print(f'The average CO2 emissions for vehicles with small engine size (2 liters or smaller) is {avg_emissions_small:.2f} g/km.')

    During our analysis we found that:

    • Diesel fuel has the lowest average fuel consumption among fuel types while Ethanol has the highest average consumption.
    • There is a very strong correlation between fuel consumption and CO2 emissions.
    • Smaller engines have lower CO2 emissoins on average.

    2️⃣ SQL - Understanding the bicycle market

    📖 Background

    You work for a chain of bicycle stores. Your new team leader comes from a different industry and wants your help learning about the bicycle market. Specifically, they need to understand better the brands and categories for sale at your stores.

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT TOP 1 product_name
    FROM products
    ORDER BY list_price DESC;
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT TOP 1 product_name
    FROM products
    ORDER BY list_price;
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT
        category_name AS category,
        COUNT(DISTINCT product_id) AS n_unique_products
    FROM products AS p
    INNER JOIN categories AS c
        ON p.category_id = c.category_id
    GROUP BY category_name
    ORDER BY n_unique_products DESC;
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT TOP 3
        brand_name,
        AVG(list_price) AS avg_price
    FROM products AS p
    INNER JOIN brands AS b
        ON p.brand_id = b.brand_id
    GROUP BY brand_name
    ORDER BY avg_price DESC;
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT TOP 3
        category_name,
        AVG(list_price) AS avg_price
    FROM products AS p
    INNER JOIN categories AS c
        ON p.category_id = c.category_id
    GROUP BY category_name
    ORDER BY avg_price DESC;

    During our analysis we found the following:

    • The most expensive bike is Trek Domane SLR 9 Disc - 2018.
    • The least expensive bike is Strider Classic 12 Balance Bike - 2018.
    • The highest number of unique bikes fall into Cruisers Bicycles category with 78 unique products while Mountain and Road bikes are tied for second place with 60 unique products.
    • The lowest number of unique bikes fall into Cyclocross Bicycles category with 10 unique products.
    • The 3 most expensive bike brands are Trek, Heller and Surly.
    • The 3 most expensive bike categories are Electric, Road, and Cyclocross.