Skip to content

Photo by Maxim Tolchinskiy on Unsplash

When factoring heat generation required for the manufacturing and transportation of products, Greenhouse gas emissions attributable to products, from food to sneakers to appliances, make up more than 75% of global emissions. (Source: The Carbon Catalogue https://www.nature.com/articles/s41597-022-01178-9)

Our data, which is publicly available on nature.com, contains product carbon footprints (PCFs) for various companies. PCFs are the greenhouse gas emissions attributable to a given product, measured in CO2 (carbon dioxide equivalent).

This data is stored in a PostgreSQL database containing one table, product_emissions, which looks at PCFs by product as well as the stage of production that these emissions occurred. Here's a snapshot of what product_emissions contains in each column:

product_emissions

fielddata type
idVARCHAR
yearINT
product_nameVARCHAR
companyVARCHAR
countryVARCHAR
industry_groupVARCHAR
weight_kgNUMERIC
carbon_footprint_pcfNUMERIC
upstream_percent_total_pcfVARCHAR
operations_percent_total_pcfVARCHAR
downstream_percent_total_pcfVARCHAR

You'll use this data to examine the carbon footprint of each industry in the dataset!

Setting up the query from creating CTE, Filtering, and joining all queries

Spinner
DataFrameas
max_year
variable
-- Setting up max_year
WITH max_year AS (
	SELECT MAX(year) AS year
	FROM product_emissions
)
-- Show the output
SELECT * FROM max_year;
Spinner
DataFrameas
column_setup
variable
-- Set up the columns needed for the query table
SELECT 
	industry_group, -- Lists of industries
	COUNT(DISTINCT company) AS num_companies, -- Count the unique numbers of company 
	ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint -- Get the total carbon footprints per industries
FROM product_emissions
GROUP BY industry_group
LIMIT 5;
Spinner
DataFrameas
query_table
variable
-- Setting up query_table
WITH query_table AS (
	SELECT 
		industry_group,
		COUNT(DISTINCT company) AS num_companies,  
		ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint 
	FROM product_emissions
	GROUP BY industry_group
)

-- Show the output
SELECT * FROM query_table;
Spinner
DataFrameas
final_query
variable
-- Getting the Final Query
WITH max_year AS (
	SELECT MAX(year) AS year
	FROM product_emissions
),
query_table AS (
	SELECT 
		industry_group,
		COUNT(DISTINCT company) AS num_companies,
		ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint 
	FROM product_emissions
	WHERE year IN (SELECT year FROM max_year)
	GROUP BY industry_group
)
-- Show final query output
SELECT *
FROM query_table
ORDER BY total_industry_footprint DESC;

Carbon Emissions by Industry

Spinner
DataFrameas
carbon_emissions_by_industry
variable
WITH max_year AS (
	SELECT MAX(year) AS year
	FROM product_emissions
),
query_table AS (
	SELECT 
		industry_group,
		COUNT(DISTINCT company) AS num_companies,
		ROUND(SUM(carbon_footprint_pcf), 1) AS total_industry_footprint 
	FROM product_emissions
	WHERE year IN (SELECT year FROM max_year)
	GROUP BY industry_group
)
-- Show final query output
SELECT *
FROM query_table
ORDER BY total_industry_footprint DESC;