Exploratory Data Analysis in SQL for Absolute Beginners
We'll be working with data from the Climate change adaptation innovation in the water sector in Africa paper which can be found here.
This study looked at the response of technology to water vulnerability created by climate change in Africa.
The data used for adaptation technology was water-related patent data. The water stress index accounts for things like projected change of annual runoff, projected change of annual groundwater recharge, fresh water withdrawal rate, water dependency ratio, dam capacity, and access to reliable drinking water. A higher index indicates higher vulnerability.
The other variables are used to define the country's size (GDP), institutional effectiveness, research and development activity, and knowledge base.
The fields included in this dataset are:
- year (data has been pooled for the following years: 1990, 2000, 2005, and 2010 to 2016)
- adaptation technologies
- openness to trade (trade as percentage of gross domestic product)
- time required to register property (calendar days)
- gross domestic product per capita
- employers (total)
- gross enrolment ratio
- water stress index
Note that we have shortened the field names in our dataset for easier coding!
To consult the solution, head over to the file browser and select notebook-solution.ipynb
.
Query the table
- Query the full table
df
- Query the
country
andwater_stress_index
fields and order by descending order of thewater_stress_index
field
df
- Query the
country
,year
, andgdp_per_capita
field to get a list of the country names and their respective GDP; order by the GDP in ascending order but only view the top 10 values
df
Filter the data
- Filter the data to see the
country
andyear
where thewater_stress_index
was between0.5
and0.6
df
- This time, filter the data to see the countries that start with the letter
E
orS
and have awater_stress_index
above0.5
df