Skip to content
Climate Change and Impacts in Africa - (Python)
  • AI Chat
  • Code
  • Report
  • Climate Change and Impacts in Africa

    According to the United Nations, Climate change refers to long-term shifts in temperatures and weather patterns. Such shifts can be natural, due to changes in the sun’s activity or large volcanic eruptions. But since the 1800s, human activities have been the main driver of climate change, primarily due to the burning of fossil fuels like coal, oil, and gas.

    The consequences of climate change now include, among others, intense droughts, water scarcity, severe fires, rising sea levels, flooding, melting polar ice, catastrophic storms, and declining biodiversity.

    You work for a Non-governmental organization tasked with reporting the state of climate change in Africa at the upcoming African Union Summit. The head of analytics has provided you with IEA-EDGAR CO2 dataset which you will clean, combine and analyze to create a report on the state of climate change in Africa. You will also provide insights on the impact of climate change on African regions (with four countries, one from each African region, as case studies).

    Dataset

    The dataset, IEA-EDGAR CO2, is a component of the EDGAR (Emissions Database for Global Atmospheric Research) Community GHG database version 7.0 (2022) including or based on data from IEA (2021) Greenhouse Gas Emissions from Energy, www.iea.org/statistics, as modified by the Joint Research Centre. The data source was the EDGARv7.0_GHG website provided by Crippa et. al. (2022) and with DOI.

    The dataset contains three sheets - IPCC 2006, 1PCC 1996, and TOTALS BY COUNTRY on the amount of CO2 (a greenhouse gas) generated by countries between 1970 and 2021. You can download the dataset from your workspace or inspect the dataset directly here.

    TOTALS BY COUNTRY SHEET

    This sheet contains the annual CO2 (kt) produced between 1970 - 2021 in each country. The relevant columns in this sheet are:

    ColumnsDescription
    C_group_IM24_shThe region of the world
    Country_code_A3The country code
    NameThe name of the country
    Y_1970 - Y_2021The amount of CO2 (kt) from 1970 - 2021

    IPCC 2006

    These sheets contain the amount of CO2 by country and the industry responsible.

    ColumnsDescription
    C_group_IM24_shThe region of the world
    Country_code_A3The country code
    NameThe name of the country
    Y_1970 - Y_2021The amount of CO2 (kt) from 1970 - 2021
    ipcc_code_2006_for_standard_report_nameThe industry responsible for generating CO2

    Instructions

    The head of analytics in your organization has specifically asked you to do the following:

    1. Clean and tidy the datasets.
    2. Create a line plot to show the trend of CO2 levels across the African regions.
    3. Determine the relationship between time (Year) and CO2 levels across the African regions.
    4. Determine if there is a significant difference in the CO2 levels among the African Regions.
    5. Determine the most common (top 5) industries in each African region.
    6. Determine the industry responsible for the most amount of CO2 (on average) in each African Region.
    7. Predict the CO2 levels (at each African region) in the year 2025.
    8. Determine if CO2 levels affect annual temperature in the selected African countries.

    IMPORTANT

    • Make a copy of this workspace.

    • Write your code within the cells provided for you. Each of those cells contain the comment "#Your code here".

    • Next, run the cells containing the checks. We've asked you not to modify these cells. To pass a check, make sure you create the variables mentioned in the instruction tasks. They (the variables) will be verified for correctness; if the cell outputs nothing your solution passes else the cell will throw an error. We included messages to help you fix these errors.

    • If you're stuck (even after reviewing related DataCamp courses), then uncomment and run the cell which contains the source code of the solution. For example, print(inspect.getsource(solutions.solution_one)) will display the solution for instruction 1. We advise you to only look at the solution to your current problem.

    • Note that workspaces created inside the "I4G 23/24" group are always private to the group and cannot be made public.

    • If after completion you want to showcase your work on your DataCamp portfolio, use "File > Make a copy" to copy over the workspace to your personal account. Then make it public so it shows up on your DataCamp portfolio.

    • We hope you enjoy working on this project as we enjoyed creating it. Cheers!

    # Setup
    import pandas as pd
    import numpy as np
    import pingouin
    from sklearn.linear_model import LinearRegression
    from statsmodels.regression.linear_model import OLS
    import seaborn as sns
    import matplotlib.pyplot as plt
    import inspect
    
    plt.style.use('ggplot')
    # The sheet names containing our datasets
    sheet_names = ['IPCC 2006', 'TOTALS BY COUNTRY']
    
    # The column names of the dataset starts from rows 11
    # Let's skip the first 10 rows
    datasets = pd.read_excel('IEA_EDGAR_CO2_1970-2021.xlsx', sheet_name = sheet_names, skiprows = 10)
    
    # we need only the African regions
    african_regions = ['Eastern_Africa', 'Western_Africa', 'Southern_Africa', 'Northern_Africa']
    
    ipcc_2006_africa = datasets['IPCC 2006'].query('C_group_IM24_sh in @african_regions')
    
    totals_by_country_africa = datasets['TOTALS BY COUNTRY'].query('C_group_IM24_sh in @african_regions')
    
    
    # Read the temperatures datasets containing four African countries
    # One from each African Region:
    # Nigeria:    West Africa
    # Ethiopa :   East Africa
    # Tunisia:    North Africa
    # Mozambique: South Africa
    temperatures = pd.read_csv('temperatures.csv')

    Instruction 1: Clean and tidy the datasets

    Tasks

    • Rename C_group_IM24_sh to Region, Country_code_A3 to Code, and ipcc_code_2006_for_standard_report_name to Industry in the corresponding African datasets.
    • Drop IPCC_annex, ipcc_code_2006_for_standard_report, and Substance from the corresponding datasets.
    • Melt Y_1970 to Y_2021 into a two columns Year and CO2. Drop rows where CO2 is missing.
    • Convert Year to int type.

    Hints

    • Use df.rename() method to rename columns.
    • The df.drop() method can be used to drop columns.
    • You might find df.melt() or pd.melt() useful.
    • The df.column.astype(int) can be used to convert to a column to an integer type.
    # Your code here (for the learner)
    
    #For the ipcc_2006_africa dataframe
    ipcc_2006_africa = (ipcc_2006_africa
                        #rename the columns
                        .rename({'C_group_IM24_sh':'Region','Country_code_A3':'Code',                     'ipcc_code_2006_for_standard_report_name':'Industry'}, axis = 'columns') 
                        # drop unneeded columns
                        .drop(['IPCC_annex','ipcc_code_2006_for_standard_report','Substance'], 
                              axis = 'columns') 
                        # convert from wide to long
                        .melt(id_vars=['Region','Code','Name','Industry','fossil_bio'],
                              var_name= 'Year', value_name = 'CO2') 
                        # drop null values
                        .dropna(subset = 'CO2') 
                       )
    # Striping and converting the data type of the 'Year column'
    ipcc_2006_africa['Year'] = ipcc_2006_africa.Year.str.strip('Y_').astype(int)
    
    #For the totals_by_country_africa dataframe
    totals_by_country_africa = (totals_by_country_africa
                                #rename the columns
                                .rename({'C_group_IM24_sh':'Region','Country_code_A3':'Code'},axis = 'columns')
                                 # drop unneeded columns
                                .drop(['IPCC_annex','Substance'], axis = 'columns')
                                # convert from wide to long
                                .melt(id_vars=['Region','Code','Name'],var_name= 'Year', value_name = 'CO2')
                                # drop null values
                                .dropna(subset = 'CO2')
                                )
    # Striping and converting the data type of the 'Year column'
    totals_by_country_africa['Year'] = totals_by_country_africa.Year.str.strip('Y_').astype(int)
    #Result
    ipcc_2006_africa.head()
    #Result
    totals_by_country_africa.head()

    Instruction 2: Show the trend of CO2 levels across the African regions

    Tasks

    • Using totals_by_country_africa, create a line plot of CO2 vs. Year in each Region to show the trend of CO2 levels by year.

    Hints

    • Use sns.lineplot() to create a line plot.
    • Your plot should be similar to the one shown below.

    # Your code here
    
    #set line plot
    g = sns.lineplot(data= totals_by_country_africa,x = 'Year', y = 'CO2', hue = 'Region',ci = None)
    
    #set title and axis labels
    g.set_title('CO2 levels across the African Region between 1970 and 2021')
    g.set_ylabel('CO2 (kt)')
    
    # set legend and legend position
    plt.legend(loc = 0)
    
    # Result
    plt.show()

    Instruction 3: Determine the relationship between time (Year) and CO2 levels across the African regions

    Tasks

    • Using the totals_by_country_africa dataset, conduct a Spearman's correlation to determine the relationship between time (Year) and CO2 within each African Region.
    • Save the results in a variable called relationship_btw_time_CO2.

    Hints

    • Use df.groupby() and df.corr() methods.
    • Use the corr() method's method parameter to set the correlation type.
    # Your code here
    relationship_btw_time_CO2 = (totals_by_country_africa
                                 .groupby('Region')[['Year','CO2']]
                                 .corr(method = 'spearman')
                                )
    
    #Result
    relationship_btw_time_CO2

    Instruction 4: Determine if there is a significant difference in the CO2 levels among the African Regions

    Tasks

    • Using totals_by_country_africa, conduct an ANOVA using pingouin.anova() on the CO2 by Region. Save the results as aov_results.
    • Conduct a posthoc test (with Bonferroni correction) using pingouin.pairwise_tests() to find the source of the significant difference. Save the results as pw_ttest_result.
    • Is it true that the CO2 levels of the Southern_Africa and Northern_Africa region do not differ significantly? The previous task should provide you with the answer.
    # Your code here
    
    #ANOVA 
    aov_results = pingouin.anova(
        totals_by_country_africa,
        dv= 'CO2',
        between= 'Region'
    )
    
    #Pairwise t-test
    pw_ttest_result = pingouin.pairwise_tests(
        totals_by_country_africa,
        dv = 'CO2',
        between = 'Region',
        padjust= 'bonf'
    )
    
    # Result
    print(aov_results,'\n')
    print(pw_ttest_result)

    Instruction 5: Determine the most common (top 5) industries in each African region.

    Tasks

    • Group the ipcc_2006_africa data by Region and Industry.
    • Count the occurrences of each Industry within each Region and name it Count.
    • Sort the data within each region group by Count in descending order
    • Get the top 5 industries for each region
    • save it to variable top_5_industries for each region.

    Hints

    • You can use the df.groupby() method to group the data by multiple columns.
    • The df.value_counts() function can be useful for counting occurrences.
    • The df.sort_values function can help you with sorting.
    # Your code here
    
    # Group by Region and Industry. Use size to get instance count. reset_index to reset indices
    grouping = (ipcc_2006_africa
                        .groupby(['Region','Industry'])
                        .size()
                        .reset_index(name='Count')
               )
    # Sort by Count. Group by Region and return the top 5 via using .head()
    top_5_industries =  (grouping
                         .sort_values(by = ['Region', 'Count'], ascending = [True,False])
                         .groupby('Region')
                         .head(5)
                         .reset_index(drop = True)
                       )
    
    #Result
    top_5_industries