Skip to content
Duplicate of CO2 Emissions Evaluation and Bicycle Market Analysis
  • AI Chat
  • Code
  • Report
  • "Everyone Can Learn Python" Competition

    Introduction

    Hi there, my name is Era Olldashi! I'm an 18-year-old student who is starting college in September. I'm excited to dive into the world of programming and learn a new language, and I've decided to start with Python. This is my first time using Python, so I'm a complete beginner, but I'm eager to learn as much as I can.

    I know that learning to code can be challenging, but I'm excited to see what I can create with Python. I hope to use my new skills to build websites, create data visualizations, and maybe even develop my own software someday. I'm grateful for any guidance and support I can get as I start my journey into the world of programming.

    To quickly find the solutions to each challenge, please scroll down below their corresponding data. Thank you!

    1️⃣ Python 🐍 - CO2 Emissions

    Now let's now move on to the competition and challenge.

    📖 Background

    You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

    After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

    💾 The data I

    You have access to seven years of CO2 emissions data for Canadian vehicles (source):

    • "Make" - The company that manufactures the vehicle.
    • "Model" - The vehicle's model.
    • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
    • "Engine Size(L)" - The engine's displacement in liters.
    • "Cylinders" - The number of cylinders.
    • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
    • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
    • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
    • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

    The data comes from the Government of Canada's open data website.

    # Import the pandas and numpy packages
    import pandas as pd  # # data processing, CSV file (e.g. pd.read_csv)
    import numpy as np     # linear algebra
    import matplotlib.pyplot as plt   # creating interactive visualizations in Python.
    import seaborn as sns  # Python data visualization library based on matplotlib
    %matplotlib inline 
    sns.set() # load seaborn's default theme
    
    # Load the data
    cars = pd.read_csv('data/co2_emissions_canada.csv')
    
    # create numpy arrays
    cars_makes = cars['Make'].to_numpy()
    cars_models = cars['Model'].to_numpy()
    cars_classes = cars['Vehicle Class'].to_numpy()
    cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
    cars_cylinders = cars['Cylinders'].to_numpy()
    cars_transmissions = cars['Transmission'].to_numpy()
    cars_fuel_types = cars['Fuel Type'].to_numpy()
    cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
    cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
    
    # Preview the dataframe
    cars
    # Look at the first ten items in the CO2 emissions array
    cars_co2_emissions[:10]

    💪 Challenge I

    Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:

    1. What is the median engine size in liters?
    2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
    3. What is the correlation between fuel consumption and CO2 emissions?
    4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
    5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
    6. Any other insights you found during your analysis?

    The answers to each question in Challenge No. 1:

    Quick explanation before we get down to business:

    1. To start, we need to first import the necessary packages for data processing, cleansing, analysis, and visualization.
    2. Once we've imported the required libraries, we can load the dataset into our workspace and take a look at its top rows. This will give us an initial idea of what the data looks like and what kind of information it contains.
    3. Next, we'll create NumPy arrays from each column in the dataset to make it easier to analyze.
    4. Before we move forward, it's important to check the dataset for any errors or missing values. This will give us more information about the data and help us ensure the accuracy of our analysis.
    # Import the pandas and numpy packages
    import pandas as pd  # # data processing, CSV file (e.g. pd.read_csv)
    import numpy as np     # linear algebra
    import matplotlib.pyplot as plt   # creating interactive visualizations in Python.
    import seaborn as sns  # Python data visualization library based on matplotlib
    %matplotlib inline 
    sns.set() # load seaborn's default theme
    
    # Load the data
    cars = pd.read_csv('data/co2_emissions_canada.csv')
    
    # create numpy arrays
    cars_makes = cars['Make'].to_numpy()
    cars_models = cars['Model'].to_numpy()
    cars_classes = cars['Vehicle Class'].to_numpy()
    cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
    cars_cylinders = cars['Cylinders'].to_numpy()
    cars_transmissions = cars['Transmission'].to_numpy()
    cars_fuel_types = cars['Fuel Type'].to_numpy()
    cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
    cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
    
    # Preview the dataframe
    cars
    
    # Look at the first ten items in the CO2 emissions array
    cars_co2_emissions[:10]
    
    # 1. What is the median engine size in liters?
    median_engine_size = np.median(cars_engine_sizes)
    print("The median engine size is:", median_engine_size, "liters")
    print('-'*100)
    
    # 2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
    # Group the data by fuel type and calculate the mean fuel consumption
    fuel_consumption_by_type = cars.groupby('Fuel Type')['Fuel Consumption Comb (L/100 km)'].mean()
    
    # Print the results
    print("Average fuel consumption by fuel type:")
    print(fuel_consumption_by_type)
    print('-'*100)
    
    # 3. What is the correlation between fuel consumption and CO2 emissions?
    fuel_consumption_co2_correlation = np.corrcoef(cars_fuel_consumption, cars_co2_emissions)[0,1]
    print("The correlation between fuel consumption and CO2 emissions is:", fuel_consumption_co2_correlation)
    print('-'*100)
    
    # 4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
    # Group the data by vehicle class and calculate the mean CO2 emissions
    co2_emissions_by_class = cars.groupby('Vehicle Class')['CO2 Emissions(g/km)'].mean()
    
    # Print the results
    print("Average CO2 emissions by vehicle class:")
    print(co2_emissions_by_class)
    print('-'*100)
    
    # 5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
    # Calculate the average CO2 emissions for all vehicles
    avg_co2_emissions_all = cars['CO2 Emissions(g/km)'].mean()
    
    # Calculate the average CO2 emissions for vehicles with an engine size of 2.0 liters or smaller
    small_engine_cars = cars[cars['Engine Size(L)'] <= 2.0]
    avg_co2_emissions_small_engine = small_engine_cars['CO2 Emissions(g/km)'].mean()
    
    # Print the results
    print("Average CO2 emissions for all vehicles:", avg_co2_emissions_all)
    print("Average CO2 emissions for vehicles with an engine size of 2.0 liters or smaller:", avg_co2_emissions_small_engine)
    print('-'*100)
    
    # 6. Any other insights you found during your analysis?
    # a. The most common fuel type for vehicles in this dataset is regular gasoline ('X'), followed by premium gasoline ('Z') and Ethanol (E85) ('E').
    # Count the number of occurrences of each fuel type
    fuel_counts = cars["Fuel Type"].value_counts()
    
    # Get the most common fuel type
    most_common_fuel = fuel_counts.index[0]
    followed_by_fuel = fuel_counts.index[1]
    followed_by_fuel2 = fuel_counts.index[2]
    
    print("The most common fuel type is:", most_common_fuel)
    print("The followed by fuel type is:", followed_by_fuel)
    print("The followed by fuel type is:", followed_by_fuel2)
    print('-'*100)
    
    # b. This,below, will return a correlation matrix showing the correlation coefficients between each pair of variables. If the correlation coefficient between engine size and fuel consumption/CO2 emissions is negative, then it would contradict the insight. However, if the correlation coefficient is positive and significant, then it would confirm the insight.
    df = cars[['Engine Size(L)', 'Fuel Consumption Comb (L/100 km)', 'CO2 Emissions(g/km)']].corr()
    
    print(df[['Engine Size(L)', 'Fuel Consumption Comb (L/100 km)', 'CO2 Emissions(g/km)']])
    print('-'*100) 
    
    # c. The output will show the standard deviation of CO2 emissions for each vehicle class. If the standard deviation is high, it confirms that there is a wide range of CO2 emissions within that class.
    std_by_class = cars.groupby('Vehicle Class')['CO2 Emissions(g/km)'].std()
    
    # Print the results
    print("standard deviation of CO2 emissions by vehicle class:")
    print(std_by_class)
    print('-'*100)
    

    Summary:

    The aim of this analysis was to investigate how vehicle characteristics are related to CO2 emissions in Canada, using data from the Canadian Vehicle Emissions Database. This dataset includes information on various vehicle attributes, such as fuel consumption, engine size, vehicle class, and CO2 emissions.

    1. median engine size in liters.

    The median engine size is 3.0 liters

    2. the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D).

    • Average fuel consumption for diesel (type D) is 8.835429 l/100 km
    • Average fuel consumption for ethanol (type E) is 16.861351 l/100 km
    • Average fuel consumption for regular gasoline (type X) 10.084575 l/100 km
    • Average fuel consumption for premium gasoline (type Z) is 11.422767 l/100 km

    3. the correlation between fuel consumption and CO2 emissions.

    The correlation between fuel consumption and CO2 emissions is: 0.9180515723870847

    # Visualising the correlation between fuel consumption and CO2 emissions
    cars.rename(columns={'Fuel Consumption Comb (L/100 km)': 'fuel_consumption_comb'}, inplace=True)
    sns.scatterplot(data=cars, x='fuel_consumption_comb', y='CO2 Emissions(g/km)', hue='Fuel Type')

    The scatter plot and correlation coefficient (0.92) both indicate a significant positive linear relationship between fuel consumption and carbon dioxide emissions. This means that as fuel consumption increases, so do carbon dioxide emissions, in a proportional manner.

    We can see from the plot that the red line representing ethanol consumption is slightly shifted to the right. Despite this, the correlation between fuel consumption and carbon dioxide emissions remains strong.

    4. Comparison between 'SUV - SMALL' & 'MID-SIZE'.

    • The average CO2 emissions for SUV - SMALL vehicles is 236.292523 g/km.
    • The average CO2 emissions for MID-SIZE vehicles is 222.455428 g/km.
    • On average, the MID-SIZE vehicle class emits less carbon dioxide compared to the SUV - SMALL class.

    5. The average CO2 emissions for all vehicles & For vehicles with an engine size of 2.0 liters or smaller.

    • Average CO2 emissions for all vehicles: 250.58 g/km
    • Average CO2 emissions for vehicles with an engine size of 2.0 liters or smaller: 198.267 g/km
    • Engines with smaller sizes generally produce lower levels of CO2 emissions.

    6. Other insights

    a) Most common fuel type

    1. The most common fuel type is: X
    2. The followed by fuel type is: Z
    3. The followed by fuel type is: E The most common fuel type for vehicles in this dataset is regular gasoline ('X'), followed by premium gasoline ('Z') and Ethanol ('E').

    b) A correlation matrix showing the correlation coefficients between engine size and fuel consumption/CO2 emissions

    A high positive correlation exists between engine size and CO2 emissions, indicating that as the engine size increases, so do the CO2 emissions. This relationship is quantified by a correlation coefficient of 0.85.

    c) The standard deviation of CO2 emissions for each vehicle class -If the standard deviation is high, it confirms that there is a wide range of CO2 emissions within that class.

    • STATION WAGON - MID-SIZE is the vihicle class that has the widest range of CO2 emissions with a standart CO2 emissions deviation of 56.414532
    • Minivan is the vihicle class that has the smallest range of CO2 emissions with a standart CO2 emissions deviation of 17.744740