Skip to content
Competition - Everyone Can Learn Python Scholarship
  • AI Chat
  • Code
  • Report
  • Spinner

    Everyone Can Learn Python Scholarship

    📖 Background

    The first "Everyone Can Learn Python" Scholarship from DataCamp is now open for entries.

    The challenges below test the Python and SQL skills you gained from Introduction to Python and Introduction to SQL and pair them with your existing problem-solving and creative thinking.

    The scholarship is open to people who have completed or are completing their secondary education and are preparing to pursue a degree in computer science or data science. Students preparing for graduate-level computer science or data science degrees are also welcome to apply.

    💡 Learn more

    The following DataCamp courses can help review the skills needed for this challenge:

    • Introduction to Python
    • Introduction to SQL

    ℹ️ Introduction to Data Science Notebooks

    You can skip this section if you are already familiar with data science notebooks.

    Data science notebooks

    A data science notebook is a document containing text cells (what you're reading now) and code cells. What is unique with a notebook is that it's interactive: You can change or add code cells and then run a cell by selecting it and then clicking the Run button to the right ( , or Run All on top) or hitting control + enter.

    The result will be displayed directly in the notebook.

    Try running the Python cell below:

    # Run this cell to see the result (click on Run on the right, or control+enter)
    100 * 1.75 * 20

    Modify any of the numbers and rerun the cell.

    You can add a Markdown, Python, or SQL cell by clicking on the Add Markdown, Add Code, and Add SQL buttons that appear as you move the mouse pointer near the bottom of any cell.

    Here at DataCamp, we call our interactive notebook Workspace. You can find out more about Workspace here.

    1️⃣ Python 🐍 - CO2 Emissions

    Now let's now move on to the competition and challenge.

    📖 Background

    You volunteer for a public policy advocacy organization in Canada, and your colleague asked you to help her draft recommendations for guidelines on CO2 emissions rules.

    After researching emissions data for a wide range of Canadian vehicles, she would like you to investigate which vehicles produce lower emissions.

    💾 The data I

    You have access to seven years of CO2 emissions data for Canadian vehicles (source):

    • "Make" - The company that manufactures the vehicle.
    • "Model" - The vehicle's model.
    • "Vehicle Class" - Vehicle class by utility, capacity, and weight.
    • "Engine Size(L)" - The engine's displacement in liters.
    • "Cylinders" - The number of cylinders.
    • "Transmission" - The transmission type: A = Automatic, AM = Automatic Manual, AS = Automatic with select shift, AV = Continuously variable, M = Manual, 3 - 10 = the number of gears.
    • "Fuel Type" - The fuel type: X = Regular gasoline, Z = Premium gasoline, D = Diesel, E = Ethanol (E85), N = natural gas.
    • "Fuel Consumption Comb (L/100 km)" - Combined city/highway (55%/45%) fuel consumption in liters per 100 km (L/100 km).
    • "CO2 Emissions(g/km)" - The tailpipe carbon dioxide emissions in grams per kilometer for combined city and highway driving.

    The data comes from the Government of Canada's open data website.

    # Import the pandas and numpy packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    # Load the data
    cars = pd.read_csv('data/co2_emissions_canada.csv')
    # create numpy arrays
    cars_makes = cars['Make'].to_numpy()
    cars_models = cars['Model'].to_numpy()
    cars_classes = cars['Vehicle Class'].to_numpy()
    cars_engine_sizes = cars['Engine Size(L)'].to_numpy()
    cars_cylinders = cars['Cylinders'].to_numpy()
    cars_transmissions = cars['Transmission'].to_numpy()
    cars_fuel_types = cars['Fuel Type'].to_numpy()
    cars_fuel_consumption = cars['Fuel Consumption Comb (L/100 km)'].to_numpy()
    cars_co2_emissions = cars['CO2 Emissions(g/km)'].to_numpy()
    # Preview the dataframe
    # Look at the first ten items in the CO2 emissions array

    💪 Challenge I

    Help your colleague gain insights on the type of vehicles that have lower CO2 emissions. Include:

    1. What is the median engine size in liters?
    2. What is the average fuel consumption for regular gasoline (Fuel Type = X), premium gasoline (Z), ethanol (E), and diesel (D)?
    3. What is the correlation between fuel consumption and CO2 emissions?
    4. Which vehicle class has lower average CO2 emissions, 'SUV - SMALL' or 'MID-SIZE'?
    5. What are the average CO2 emissions for all vehicles? For vehicles with an engine size of 2.0 liters or smaller?
    6. Any other insights you found during your analysis?
    First of all:

    lets check the structure of our data, to understand the data we are dealing with to find if there is any missing or null data and each column Dtype
    First Question : What is the median engine size in liters?

    To find the median car engine size i will use np.median which returns the median of the array elements the array used will be the 'cars_engine_sizes' which is created at line 3, What i found was that the median car size is 3.0L