Skip to content

GoodThought NGO has been a catalyst for positive change, focusing its efforts on education, healthcare, and sustainable development to make a significant difference in communities worldwide. With this mission, GoodThought has orchestrated an array of assignments aimed at uplifting underprivileged populations and fostering long-term growth.

This project offers a hands-on opportunity to explore how data-driven insights can direct and enhance these humanitarian efforts. In this project, you'll engage with the GoodThought PostgreSQL database, which encapsulates detailed records of assignments, funding, impacts, and donor activities from 2010 to 2023. This comprehensive dataset includes:

  • Assignments: Details about each project, including its name, duration (start and end dates), budget, geographical region, and the impact score.
  • Donations: Records of financial contributions, linked to specific donors and assignments, highlighting how financial support is allocated and utilized.
  • Donors: Information on individuals and organizations that fund GoodThought’s projects, including donor types.

Refer to the below ERD diagram for a visual representation of the relationships between these data tables:

You will execute SQL queries to answer two questions, as listed in the instructions. Good luck!

Spinner
DataFrameas
assignments
variable
SELECT * 
FROM public.assignments
LIMIT 5
Spinner
DataFrameas
df4
variable
SELECT * 
FROM public.donations
LIMIT 5
Spinner
DataFrameas
df5
variable
SELECT donor_id, SUM(amount) AS tot_amt
FROM public.donations
GROUP BY donor_id
ORDER BY tot_amt DESC
LIMIT 5
Spinner
DataFrameas
df6
variable
SELECT c.donor_type, COUNT(d.donation_id) AS no_of_donations,
	SUM(d.amount) AS tot_amt, ROUND(AVG(amount), 2) AS avg_amt
FROM public.donations AS d
JOIN donors AS c
ON d.donor_id = c.donor_id
GROUP BY c.donor_type
ORDER BY tot_amt DESC
Spinner
DataFrameas
df
variable
SELECT
	donation_date, COUNT(donation_id) AS no_of_donations,
	SUM(amount) AS tot_amt, ROUND(AVG(amount), 2) AS avg_amt
FROM public.donations
GROUP BY donation_date
ORDER BY no_of_donations DESC, tot_amt DESC
LIMIT 5
Spinner
DataFrameas
df1
variable
SELECT DISTINCT ON (region)
	a.assignment_name, a.region, a.budget AS max_budget, 
	SUM(d.amount) OVER (PARTITION BY assignment_name) AS tot_amt
FROM public.assignments AS a
JOIN donations AS d
ON a.assignment_id = d.assignment_id
ORDER BY region, max_budget DESC;
Spinner
DataFrameas
df3
variable
SELECT DISTINCT ON (region)
	region, 
	AVG(impact_score) OVER (PARTITION BY region) AS avg_impact,
	ROUND(AVG(budget) OVER (PARTITION BY region), 2) AS avg_budget,
	COUNT(assignment_id) OVER (PARTITION BY region) AS num
FROM public.assignments
Spinner
Queryas
query
variable
SELECT
    assignment_name,
    region,
    CASE
        WHEN start_date ~ '^\d{2}/\d{2}/\d{4}$' THEN TO_DATE(start_date, 'DD/MM/YYYY')
        WHEN start_date ~ '^\d{4}-\d{2}-\d{2}$' THEN TO_DATE(start_date, 'YYYY-MM-DD')
        ELSE NULL
    END AS start_date,
    CASE
        WHEN end_date ~ '^\d{2}/\d{2}/\d{4}$' THEN TO_DATE(end_date, 'DD/MM/YYYY')
        WHEN end_date ~ '^\d{4}-\d{2}-\d{2}$' THEN TO_DATE(end_date, 'YYYY-MM-DD')
        ELSE NULL
    END AS end_date,
	budget, impact_score
FROM
    public.assignments
WHERE
    (start_date ~ '^\d{4}-\d{2}-\d{2}$'
    OR start_date ~ '^\d{2}/\d{2}/\d{4}$')
	AND (end_date ~ '^\d{4}-\d{2}-\d{2}$'  
    OR end_date ~ '^\d{2}/\d{2}/\d{4}$')
Spinner
DataFrameas
df2
variable
SELECT
    assignment_name,
    region,
	(end_date - start_date) AS no_of_days,
	budget, impact_score
	
FROM query
ORDER BY no_of_days DESC
Hidden output
Spinner
DataFrameas
highest_donation_assignments
variable
-- highest_donation_assignments
SELECT a.assignment_name, a.region, ROUND(SUM(d.amount), 2) AS rounded_total_donation_amount, b.donor_type
FROM assignments AS a
JOIN donations AS d
ON a.assignment_id = d.assignment_id
JOIN donors AS b
ON d.donor_id = b.donor_id
GROUP BY a.assignment_name, a.region, b.donor_type
ORDER BY rounded_total_donation_amount DESC
LIMIT 5
Spinner
DataFrameas
top_regional_impact_assignments
variable
-- top_regional_impact_assignments
SELECT DISTINCT ON (a.region)
	a.assignment_name, a.region, MAX(a.impact_score) AS impact_score, COUNT(d.donor_id) AS num_total_donations
FROM assignments AS a
JOIN donations AS d
ON a.assignment_id = d.assignment_id
GROUP BY a.assignment_name, a.region
ORDER BY a.region ASC, MAX(a.impact_score) DESC