Skip to content
Analysis with Python and SQL: CO2 emission evaluation and bicycle market analysis (Bankole Moses)
  • AI Chat
  • Code
  • Report
  • CO2 EMISSION EVALUATION AND BICYCLE MARKET ANALYSIS

    # Import the pandas and numpy packages
    import pandas as pd
    import numpy as np
    
    # Load the data
    cars = pd.read_csv('data/co2_emissions_canada.csv')
    
    # create numpy arrays
    cars_makes = cars['Make'].to_numpy()
    cars_models = cars['Model'].to_numpy()
    cars_classes = cars['Vehicle Class'].to_numpy()
    cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
    cars_cylinders = cars['Cylinders'].to_numpy()
    cars_transmissions = cars['Transmission'].to_numpy()
    cars_fuel_types = cars['Fuel Type'].to_numpy()
    cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
    cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
    
    # Preview the dataframe
    cars

    first, lets see if our data is cleaned and do some exploratory data analysis

    #changing our dataframe from cars to df
    df = cars
    df.head()
    df.nunique()

    from our summary, we have:

    • 41 types of make
    • 2053 types of model
    • 16 types of vehicle size
    • 51 types of engine size(L)
    • 27 types of transmission
    • 5 types of fuel.... in our DataFrame
    #checking for missing data in our dataset
    missing_values = df.isnull()
    for column in missing_values.columns.values.tolist():
        print(column)
        print(missing_values[column].value_counts())
        print("")

    A true indicates a missing and false indicates otherwise.

    From our data, we have no missing values, so it seems theres less work to do on cleaning our dataset.

    df.columns
    df["Fuel Type"].value_counts()

    From our summary, we have:

    • 3637 cars with fuel type X
    • 3202 cars with fuel type Z
    • 370 cars with fuel type E
    • 175 cars with fuel type D
    • 1 cars with fuel type N
    #checking data dypes
    df.dtypes
    df.describe()

    To find the median engine-size

    engine_size = df[["Engine Size(L)"]]