Everyone Can Learn Python Scholarship
π Background
The first "Everyone Can Learn Python" Scholarship from DataCamp is now open for entries.
The challenges below test the Python and SQL skills you gained from Introduction to Python and Introduction to SQL and pair them with your existing problem-solving and creative thinking.
The scholarship is open to people who have completed or are completing their secondary education and are preparing to pursue a degree in computer science or data science. Students preparing for graduate-level computer science or data science degrees are also welcome to apply.
1οΈβ£ Python π - CO2 Emissions
Now let's now move on to the competition and challenge.
π Background
You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.
After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.
πΎ The data I
You have access to seven years of CO2 emissions data for Canadian vehicles (source):
- "Make" - The company that manufactures the vehicle.
- "Model" - The vehicle's model.
- "Vehicle Class" - Vehicle class by utility, capacity, and weight.
- "Engine Size(L)" - The engine's displacement in liters.
- "Cylinders" - The number of cylinders.
- "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
- "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
- "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
- "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.
The data comes from the Government of Canada's open data website.
# Import the pandas and numpy packages
import pandas as pd
import numpy as np
# Load the data
cars = pd.read_csv('data/co2_emissions_canada.csv')
# create numpy arrays
cars_makes = cars['Make'].to_numpy()
cars_models = cars['Model'].to_numpy()
cars_classes = cars['Vehicle Class'].to_numpy()
cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
cars_cylinders = cars['Cylinders'].to_numpy()
cars_transmissions = cars['Transmission'].to_numpy()
cars_fuel_types = cars['Fuel Type'].to_numpy()
cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
# Preview the dataframe
cars
πͺ Challenge I
Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:
- What is the median engine size in liters?
- What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
- What is the correlation between fuel consumption and CO2 emissions?
- Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
- What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
- Any other insights you found during your analysis?
# What is the median engine size in liters?
np.median(cars_engine_sizes)
# Average fuel consumption for regular gasoline (X), premium gasoline (Z), ethanol (E), and diesel (D)
mean_X = np.mean(cars_fuel_consumption[cars_fuel_types == 'X'])
print("Average fuel consumption for regular gasoline (Fuel Type = X): " + str(mean_X))
mean_Z = np.mean(cars_fuel_consumption[cars_fuel_types == 'Z'])
print("Average fuel consumption for premium gasoline (Fuel Type = Z): " + str(mean_Z))
mean_E = np.mean(cars_fuel_consumption[cars_fuel_types == 'E'])
print("Average fuel consumption for ethanol (Fuel Type = E): " + str(mean_E))
mean_D = np.mean(cars_fuel_consumption[cars_fuel_types == 'D'])
print("Average fuel consumption for diesel (Fuel Type = D): " + str(mean_D))
The results show that the average fuel consumption is the highest for ethanol (E), but are there other variables that are influenting it?
np.corrcoef(cars_engine_sizes, cars_fuel_consumption)
It seems to be a positive correlation between the engine size and fuel consumption. So, in order to see if that variable is modyfing the average fuel consumption per fuel type, and also because the engine size is measured in liters, fuel consumption per fuel type per liter (of engine) can be calculated.
consumption_per_liter = cars_fuel_consumption / cars_engine_sizes
unique_fuel_types = np.unique(cars_fuel_types)
for fuel_type in unique_fuel_types:
mean = np.mean(consumption_per_liter[cars_fuel_types == fuel_type])
print("Fuel Type: "+str(fuel_type)+", Mean fuel consumption per liter: "+str(mean))
Through the results obtained, it can be seen that after all, ethanol still has the highest fuel consumption
# Correlation between fuel consumption and CO2 emissions
np.corrcoef(cars_fuel_consumption, cars_co2_emissions)
According to the result obtained, there is a very strong positive correlation between fuel consumption and CO2 emissions. In other words, the higher the fuel consumption, the higher the emissions.
β
β