Climate Change and Impacts in Africa
According to the United Nations, Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas.
The consequences of climate change now include, among others, intense droughts, water scarcity, severe fires, rising sea levels, flooding, melting polar ice, catastrophic storms, and declining biodiversity.
You work for a Non-governmental organization tasked with reporting the state of climate change in Africa at the upcoming African Union Summit. The head of analytics has provided you with IEA-EDGAR CO2 dataset which you will clean, combine and analyze to create a report on the state of climate change in Africa. You will also provide insights on the impact of climate change on African regions (with four countries, one from each African region, as case studies).
The dataset, IEA-EDGAR CO2, is a component of the EDGAR (Emissions Database for Global Atmospheric Research) Community GHG database version 7.0 (2022) including or based on data from IEA (2021) Greenhouse Gas Emissions from Energy,, as modified by the Joint Research Centre. The data source was the EDGARv7.0_GHG website provided by Crippa et. al. (2022) and with DOI.
The dataset contains three sheets - IPCC 2006
, 1PCC 1996
on the amount of CO2 (a greenhouse gas) generated by countries between 1970 and 2021. You can download the dataset from your workspace or inspect the dataset directly here.
This sheet contains the annual CO2 (kt) produced between 1970 - 2021 in each country. The relevant columns in this sheet are:
Columns | Description |
C_group_IM24_sh | The region of the world |
Country_code_A3 | The country code |
Name | The name of the country |
Y_1970 - Y_2021 | The amount of CO2 (kt) from 1970 - 2021 |
IPCC 2006
These sheets contain the amount of CO2 by country and the industry responsible.
Columns | Description |
C_group_IM24_sh | The region of the world |
Country_code_A3 | The country code |
Name | The name of the country |
Y_1970 - Y_2021 | The amount of CO2 (kt) from 1970 - 2021 |
ipcc_code_2006_for_standard_report_name | The industry responsible for generating CO2 |
The head of analytics in your organization has specifically asked you to do the following:
- Clean and tidy the datasets.
- Create a line plot to show the trend of
levels across the African regions. - Determine the relationship between time (
) andCO2
levels across the African regions. - Determine if there is a significant difference in the
levels among the African Regions. - Determine the most common (top 5) industries in each African region.
- Determine the industry responsible for the most amount of CO2 (on average) in each African Region.
- Predict the
levels (at each African region) in the year 2025. - Determine if
levels affect annualtemperature
in the selected African countries.
# Setup
import pandas as pd
import numpy as np
import pingouin
from sklearn.linear_model import LinearRegression
from statsmodels.regression.linear_model import OLS
import seaborn as sns
import matplotlib.pyplot as plt
import inspect'ggplot')
# The sheet names containing our datasets
sheet_names = ['IPCC 2006', 'TOTALS BY COUNTRY']
# The column names of the dataset starts from rows 11
# Let's skip the first 10 rows
datasets = pd.read_excel('IEA_EDGAR_CO2_1970-2021.xlsx', sheet_name = sheet_names, skiprows = 10)
# we need only the African regions
african_regions = ['Eastern_Africa', 'Western_Africa', 'Southern_Africa', 'Northern_Africa']
ipcc_2006_africa = datasets['IPCC 2006'].query('C_group_IM24_sh in @african_regions')
totals_by_country_africa = datasets['TOTALS BY COUNTRY'].query('C_group_IM24_sh in @african_regions')
# Read the temperatures datasets containing four African countries
# One from each African Region:
# Nigeria: West Africa
# Ethiopa : East Africa
# Tunisia: North Africa
# Mozambique: South Africa
temperatures = pd.read_csv('temperatures.csv')
# The solution code and the test runner
import tests as runner
import solutions
Instruction 1: Clean and tidy the datasets
- Rename
, andipcc_code_2006_for_standard_report_name
in the corresponding African datasets. - Drop
, andSubstance
from the corresponding datasets. - Melt
into a two columnsYear
. Drop rows whereCO2
is missing. - Convert
# Rename columns in both datasets
columns_names = {'C_group_IM24_sh': 'Region',
'Country_code_A3': 'Code',
'ipcc_code_2006_for_standard_report_name': 'Industry'}
ipcc_2006_africa = ipcc_2006_africa.rename(columns=columns_names)
totals_by_country_africa = totals_by_country_africa.rename(columns=columns_names)
# Drop unrequired columns in both dataframes
drop_columns = ['IPCC_annex', 'ipcc_code_2006_for_standard_report', 'Substance']
ipcc_2006_africa = ipcc_2006_africa.drop(drop_columns, axis=1)
totals_by_country_africa = totals_by_country_africa.drop(['IPCC_annex', 'Substance'], axis=1)
#Melt the two dataframes into two columns
ipcc_2006_africa = ipcc_2006_africa.melt(id_vars =['Region', 'Code', 'Name', 'Industry', 'fossil_bio'], var_name='Year', value_name= 'CO2')
totals_by_country_africa = totals_by_country_africa.melt(id_vars= ['Region', 'Code', 'Name'], var_name='Year', value_name='CO2')
#Drop null values in the CO2 column
ipcc_2006_africa.dropna(subset=['CO2'], inplace=True)
#Convert Year into the data type Int
#first remove 'Y_' from the years
ipcc_2006_africa['Year'] = ipcc_2006_africa['Year'].str.replace('Y_', '')
ipcc_2006_africa['Year'] = ipcc_2006_africa['Year'].astype(int)
totals_by_country_africa['Year'] = totals_by_country_africa['Year'].str.replace('Y_', '')
totals_by_country_africa['Year'] = totals_by_country_africa['Year'].astype(int)
Instruction 2: Show the trend of CO2
levels across the African regions
levels across the African regionsTasks
- Using
, create a line plot ofCO2
in eachRegion
to show the trend of CO2 levels by year.
# Trend of CO2 level emmissions per African
sns.lineplot(x='Year', y='CO2', hue='Region', ci = None, data = totals_by_country_africa)
plt.ylabel('CO2 (kt)')
plt.title('CO2 levels across the African Regions between 1970 and 2021')
Instruction 3: Determine the relationship between time (Year
) and CO2
levels across the African regions
) and CO2
levels across the African regionsTasks
- Using the
dataset, conduct a Spearman's correlation to determine the relationship between time (Year
) andCO2
within each AfricanRegion
. - Save the results in a variable called
# Correlation between Year and CO2 per Region
relationship_btw_time_CO2 = totals_by_country_africa.groupby('Region')[['Year', 'CO2']].corr(method='spearman')
Instruction 4: Determine if there is a significant difference in the CO2 levels among the African Regions
- Using
, conduct an ANOVA usingpingouin.anova()
on theCO2
. Save the results asaov_results
. - Conduct a posthoc test (with Bonferroni correction) using
to find the source of the significant difference. Save the results aspw_ttest_result
. - Is it true that the
levels of theSouthern_Africa
region do not differ significantly? The previous task should provide you with the answer.
# Conduct ANOVA test on CO2 by Region
aov_results = pingouin.anova(data=totals_by_country_africa, dv='CO2', between='Region')
# Conduct pairwise_ttest on CO2 by Region
pw_ttest_result = pingouin.pairwise_ttests(data=totals_by_country_africa, dv='CO2', between='Region', padjust='bonf')
print('', pw_ttest_result, sep='\n')
# Is it true that the `CO2` levels of the `Southern_Africa` and `Northern_Africa` region do not differ significantly?
print('\nThe p-value for the contrast between this two regions is 0.079 which is greater that the usual significance level of 0.05, therefore it is okay to concude that CO2 between this two regions do not differ significantly')
Instruction 5: Determine the most common (top 5) industries in each African region.
- Group the
data byRegion
. - Count the occurrences of each
within eachRegion
and name itCount
. - Sort the data within each region group by
in descending order - Get the top 5 industries for each region
- save it to variable
for each region.
# Top FIVE industries in each African region
# Group data by Region and Industry and count occurrences of each Industry within each Region
grouped_ipcc_2006_africa = ipcc_2006_africa.groupby(['Region','Industry']).size().reset_index(name = 'Count')
# Sort data by Count in descending order in each region
grouped_ipcc_2006_africa_sorted = grouped_ipcc_2006_africa.sort_values(['Region', 'Count'], ascending=[True, False])
# Top 5 industries per region
top_5_industries = grouped_ipcc_2006_africa_sorted.groupby('Region').head(5).reset_index(drop = True)