Exploratory Data Analysis in SQL for Absolute Beginners
We'll be working with data from the Climate change adaptation innovation in the water sector in Africa paper which can be found here.
This study looked at the response of technology to water vulnerability created by climate change in Africa.
The data used for adaptation technology was water-related patent data. The water stress index accounts for things like projected change of annual runoff, projected change of annual groundwater recharge, fresh water withdrawal rate, water dependency ratio, dam capacity, and access to reliable drinking water. A higher index indicates higher vulnerability.
The other variables are used to define the country's size (GDP), institutional effectiveness, research and development activity, and knowledge base.
The fields included in this dataset are:
- year (data has been pooled for the following years: 1990, 2000, 2005, and 2010 to 2016)
- adaptation technologies
- openness to trade (trade as percentage of gross domestic product)
- time required to register property (calendar days)
- gross domestic product per capita
- employers (total)
- gross enrolment ratio
- water stress index
Note that we have shortened the field names in our dataset for easier coding!
Query the table
- Query the full table
SELECT *
FROM climate;
- Query the
country
andwater_stress_index
fields and order by descending order of thewater_stress_index
field
SELECT country, water_stress_index
FROM climate
ORDER BY water_stress_index DESC;
- Query the
country
,year
, andgdp_per_capita
field to get a list of the country names and their respective GDP; order by the GDP in ascending order but only view the top 10 values
SELECT DISTINCT country, year, gdp_per_capita
FROM climate
ORDER BY gdp_per_capita
LIMIT 10;
Filter the data
- Filter the data to see the
country
andyear
where thewater_stress_index
was between0.5
and0.6
SELECT country, year
FROM climate
WHERE water_stress_index
BETWEEN 0.5 AND 0.6;
- This time, filter the data to see the countries that start with the letter
E
orS
and have awater_stress_index
above0.5
SELECT country, water_stress_index
FROM climate
WHERE (country LIKE 'E%' OR country LIKE 'S%') AND water_stress_index > 0.5;
Aggregate, group, and sort the data