Skip to content

Analyzing Carbon Footprints in SQL

Greenhouse gas emissions attributable to products—from food to sneakers to appliances—make up more than 75% of global emissions.

Cite: The Carbon Catalogue

Our dataset, which is publicly availably on nature.com, stores product carbon footprints (PCFs) for various companies. PCFs are the greenhouse gas emissions attributable to a given product, measured in CO2e (carbon dioxide equivalent).

To consult the solution, head over to the file browser and select notebook-solution.ipynb.

Connecting data

Set up connection

  • In the left side bar, go to Databases > Connect a database > PostgreSQL
  • Use the following credentials:
    • Integration name: SQL Carbon Emissions
    • Port: 5432
    • Hostname: workspacedemodb.datacamp.com
    • Database: emissions
    • Username: emissions_codealong
    • Password: emissions_codealong

Database contents

Our database contains one table, product_emissions, which looks at PCFs by product as well as the stage of production these emissions occured in.

fielddata_type
idVARCHAR
yearINT
product_nameVARCHAR
companyVARCHAR
countryVARCHAR
industry_groupVARCHAR
weight_kgNUMERIC
carbon_footprint_pcfNUMERIC
upstream_percent_total_pcfVARCHAR
operations_percent_total_pcfVARCHAR
downstream_percent_total_pcfVARCHAR

1: Coca-Cola's emissions

First, let's look at a small subset of the data: emissions reported by Coca-Cola. Coke is actually made up of multiple companies around the globe, so we'll make sure our query returns data for any company name that starts with "Coca-Cola". Coke used to report for every single different product it has, so we will limit the results to six.

  • Select all fields from the product_emissions table where the company name begins with "Coca-Cola", limiting to the first six results.
Spinner
DataFrameas
df
variable
-- Select all fields where the company name is Coca-Cola, limiting to the first six results
SELECT *
FROM product_emissions
WHERE company LIKE 'Coca-Cola%'
LIMIT 6;

A little background:

  • Upstream emissions: emissions that occur before the company's own operations such as emissions created by manufacturing bottles that Coke buys from suppliers
  • Operations emissions: emissions that the company creates directly, such as when Coke is bottling its product
  • Downstream emissions: emissions that occur after the product leaves the company, such as after Coke has sold drinks to McDonald's

2: Most recent data

We'll focus on recent emissions data during this code-along. When was the most recent data collected?

  • Return the most recent year for which data was collected.
Spinner
DataFrameas
df
variable
-- Return the most recent year for which data was collected
SELECT MAX(year)
FROM product_emissions;

3: Targeting major emitters

What are the industries with the most emissions in 2017 (the most recent year that data is available)?

  • Return the industry_group and a rounded total of carbon_footprint_pcf for each industry, aliasing as total_industry_footprint.
  • Limit to data for 2017 and order by total_industry_footprint.