Project: Impact Analysis of GoodThought NGO Initiatives

GoodThought NGO has been a catalyst for positive change, focusing its efforts on education, healthcare, and sustainable development to make a significant difference in communities worldwide. With this mission, GoodThought has orchestrated an array of assignments aimed at uplifting underprivileged populations and fostering long-term growth.

This project offers a hands-on opportunity to explore how data-driven insights can direct and enhance these humanitarian efforts. In this project, you'll engage with the GoodThought PostgreSQL database, which encapsulates detailed records of assignments, funding, impacts, and donor activities from 2010 to 2023. This comprehensive dataset includes:

Assignments: Details about each project, including its name, duration (start and end dates), budget, geographical region, and the impact score.
Donations: Records of financial contributions, linked to specific donors and assignments, highlighting how financial support is allocated and utilized.
Donors: Information on individuals and organizations that fund GoodThought’s projects, including donor types.

Refer to the below ERD diagram for a visual representation of the relationships between these data tables:

You will execute SQL queries to answer two questions, as listed in the instructions. Good luck!

DataFrameas

assignments

variable

SELECT * 
FROM public.assignments
LIMIT 5

DataFrameas

df4

variable

SELECT * 
FROM public.donations
LIMIT 5

DataFrameas

df5

variable

SELECT donor_id, SUM(amount) AS tot_amt
FROM public.donations
GROUP BY donor_id
ORDER BY tot_amt DESC
LIMIT 5

DataFrameas

df6

variable

SELECT c.donor_type, COUNT(d.donation_id) AS no_of_donations,
	SUM(d.amount) AS tot_amt, ROUND(AVG(amount), 2) AS avg_amt
FROM public.donations AS d
JOIN donors AS c
ON d.donor_id = c.donor_id
GROUP BY c.donor_type
ORDER BY tot_amt DESC

DataFrameas

df

variable

SELECT
	donation_date, COUNT(donation_id) AS no_of_donations,
	SUM(amount) AS tot_amt, ROUND(AVG(amount), 2) AS avg_amt
FROM public.donations
GROUP BY donation_date
ORDER BY no_of_donations DESC, tot_amt DESC
LIMIT 5

DataFrameas

df1

variable

SELECT DISTINCT ON (region)
	a.assignment_name, a.region, a.budget AS max_budget, 
	SUM(d.amount) OVER (PARTITION BY assignment_name) AS tot_amt
FROM public.assignments AS a
JOIN donations AS d
ON a.assignment_id = d.assignment_id
ORDER BY region, max_budget DESC;

DataFrameas

df3

variable

SELECT DISTINCT ON (region)
	region, 
	AVG(impact_score) OVER (PARTITION BY region) AS avg_impact,
	ROUND(AVG(budget) OVER (PARTITION BY region), 2) AS avg_budget,
	COUNT(assignment_id) OVER (PARTITION BY region) AS num
FROM public.assignments

Queryas

query

variable

SELECT
    assignment_name,
    region,
    CASE
        WHEN start_date ~ '^\d{2}/\d{2}/\d{4}$' THEN TO_DATE(start_date, 'DD/MM/YYYY')
        WHEN start_date ~ '^\d{4}-\d{2}-\d{2}$' THEN TO_DATE(start_date, 'YYYY-MM-DD')
        ELSE NULL
    END AS start_date,
    CASE
        WHEN end_date ~ '^\d{2}/\d{2}/\d{4}$' THEN TO_DATE(end_date, 'DD/MM/YYYY')
        WHEN end_date ~ '^\d{4}-\d{2}-\d{2}$' THEN TO_DATE(end_date, 'YYYY-MM-DD')
        ELSE NULL
    END AS end_date,
	budget, impact_score
FROM
    public.assignments
WHERE
    (start_date ~ '^\d{4}-\d{2}-\d{2}$'
    OR start_date ~ '^\d{2}/\d{2}/\d{4}$')
	AND (end_date ~ '^\d{4}-\d{2}-\d{2}$'  
    OR end_date ~ '^\d{2}/\d{2}/\d{4}$')

DataFrameas

df2

variable

SELECT
    assignment_name,
    region,
	(end_date - start_date) AS no_of_days,
	budget, impact_score
	
FROM query
ORDER BY no_of_days DESC

Hidden output

DataFrameas

highest_donation_assignments

variable

-- highest_donation_assignments
SELECT a.assignment_name, a.region, ROUND(SUM(d.amount), 2) AS rounded_total_donation_amount, b.donor_type
FROM assignments AS a
JOIN donations AS d
ON a.assignment_id = d.assignment_id
JOIN donors AS b
ON d.donor_id = b.donor_id
GROUP BY a.assignment_name, a.region, b.donor_type
ORDER BY rounded_total_donation_amount DESC
LIMIT 5

DataFrameas

top_regional_impact_assignments

variable

-- top_regional_impact_assignments
SELECT DISTINCT ON (a.region)
	a.assignment_name, a.region, MAX(a.impact_score) AS impact_score, COUNT(d.donor_id) AS num_total_donations
FROM assignments AS a
JOIN donations AS d
ON a.assignment_id = d.assignment_id
GROUP BY a.assignment_name, a.region
ORDER BY a.region ASC, MAX(a.impact_score) DESC