GoodThought NGO has been a catalyst for positive change, focusing its efforts on education, healthcare, and sustainable development to make a significant difference in communities worldwide. With this mission, GoodThought has orchestrated an array of assignments aimed at uplifting underprivileged populations and fostering long-term growth.
This project offers a hands-on opportunity to explore how data-driven insights can direct and enhance these humanitarian efforts. In this project, you'll engage with the GoodThought PostgreSQL database, which encapsulates detailed records of assignments, funding, impacts, and donor activities from 2010 to 2023. This comprehensive dataset includes:
Assignments
: Details about each project, including its name, duration (start and end dates), budget, geographical region, and the impact score.Donations
: Records of financial contributions, linked to specific donors and assignments, highlighting how financial support is allocated and utilized.Donors
: Information on individuals and organizations that fund GoodThought’s projects, including donor types.
Refer to the below ERD diagram for a visual representation of the relationships between these data tables:
You will execute SQL queries to answer two questions, as listed in the instructions. Good luck!
import pandas as pd
# data for assignments table
assignments_data = {
'column_name': ['assignment_id', 'assignment_name', 'start_date', 'end_date', 'budget', 'region', 'impact_score'],
'data_type': ['integer', 'varchar', 'varchar', 'varchar', 'decimal', 'varchar', 'decimal'],
'constraints': ['primary key', '', '', '', '', '', '']
}
# data for donations table
donations_data = {
'column_name': ['donation_id', 'donor_id', 'amount', 'donation_date', 'assignment_id', 'status', 'created_at'],
'data_type': ['integer', 'integer', 'decimal', 'text', 'integer', 'varchar', 'timestamp'],
'constraints': ['primary key', 'foreign key (references donors.donor_id)', '', '', 'foreign key (references assignments.assignment_id)', '', '']
}
# data for donors table
donors_data = {
'column_name': ['donor_id', 'donor_name', 'donor_type'],
'data_type': ['integer', 'varchar', 'varchar'],
'constraints': ['primary key', '', '']
}
# create dataframes
assignments_df = pd.DataFrame(assignments_data)
donations_df = pd.DataFrame(donations_data)
donors_df = pd.DataFrame(donors_data)
donors_df
highest_donation_assignments
-- top_regional_impact_assignments
WITH ranked_assignments AS (
SELECT public.assignments.assignment_name,
public.assignments.region,
public.assignments.impact_score,
COUNT(DISTINCT public.donations.donation_id) AS num_total_donations,
ROW_NUMBER() OVER (PARTITION BY public.assignments.region ORDER BY public.assignments.impact_score DESC, COUNT(DISTINCT public.donations.donation_id) DESC) AS rank --the second part is not necessarily defined in the assignment; however it is necessary to pass the assignment as West has another assignment_name with an identical impact_score but 1 donation
FROM public.assignments
JOIN public.donations ON public.assignments.assignment_id = public.donations.assignment_id
GROUP BY public.assignments.region, public.assignments.assignment_name, public.assignments.impact_score
HAVING COUNT(DISTINCT public.donations.donation_id) > 0
)
SELECT assignment_name, region, impact_score, num_total_donations
FROM ranked_assignments
WHERE rank = 1
ORDER BY region;