Skip to content
Exploratory Data Analysis in SQL for Absolute Beginners
  • AI Chat
  • Code
  • Report
  • Exploratory Data Analysis in SQL for Absolute Beginners

    We'll be working with data from the Climate change adaptation innovation in the water sector in Africa paper which can be found here.

    This study looked at the response of technology to water vulnerability created by climate change in Africa.

    The data used for adaptation technology was water-related patent data. The water stress index accounts for things like projected change of annual runoff, projected change of annual groundwater recharge, fresh water withdrawal rate, water dependency ratio, dam capacity, and access to reliable drinking water. A higher index indicates higher vulnerability.

    The other variables are used to define the country's size (GDP), institutional effectiveness, research and development activity, and knowledge base.

    The fields included in this dataset are:

    • year (data has been pooled for the following years: 1990, 2000, 2005, and 2010 to 2016)
    • adaptation technologies
    • openness to trade (trade as percentage of gross domestic product)
    • time required to register property (calendar days)
    • gross domestic product per capita
    • employers (total)
    • gross enrolment ratio
    • water stress index

    Note that we have shortened the field names in our dataset for easier coding!

    Query the table

    1. Query the full table
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM climate;
    1. Query the country and water_stress_index fields and order by descending order of the water_stress_index field
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT country, water_stress_index
    FROM climate
    ORDER BY water_stress_index DESC;
    1. Query the country, year, and gdp_per_capita field to get a list of the country names and their respective GDP; order by the GDP in ascending order but only view the top 10 values
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT DISTINCT country, year, gdp_per_capita
    FROM climate
    ORDER BY gdp_per_capita
    LIMIT 10;

    Filter the data

    1. Filter the data to see the country and year where the water_stress_index was between 0.5 and 0.6
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT country, year
    FROM climate
    WHERE water_stress_index
    BETWEEN 0.5 AND 0.6;
    1. This time, filter the data to see the countries that start with the letter E or S and have a water_stress_index above 0.5
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT country, water_stress_index
    FROM climate
    WHERE (country LIKE 'E%' OR country LIKE 'S%') AND water_stress_index > 0.5;
    

    Aggregate, group, and sort the data