Skip to content

In today's fast-paced and competitive educational environment, understanding the factors that influence student success is more important than ever. Just like the transport system in a bustling city like London must adapt to serve its residents, schools and educators must adapt to meet the needs of students. In this project, we will take a deep dive into a dataset containing rich details about various aspects of student life, such as hours studied, sleep patterns, attendance, and more, to uncover what truly impacts exam performance.

The dataset we'll be working with includes a wide range of factors influencing student performance. By analyzing this data, we'll be able to identify key drivers of success and provide insights that could help students, teachers, and policymakers make informed decisions. The table we'll use for this project is called student_performance and includes the following data:

ColumnDefinitionData type
attendancePercentage of classes attendedfloat
extracurricular_activitiesParticipation in extracurricular activitiesvarchar (Yes, No)
sleep_hoursAverage number of hours of sleep per nightfloat
tutoring_sessionsNumber of tutoring sessions attended per monthinteger
teacher_qualityQuality of the teachersvarchar (Low, Medium, High)
exam_scoreFinal exam scorefloat

You will execute SQL queries to answer three questions, as listed in the instructions.

Spinner
DataFrameas
df
variable
SELECT *
FROM student_performance
Spinner
DataFrameas
avg_exam_score_by_study_and_extracurricular
variable
-- avg_exam_score_by_study_and_extracurricular
-- Edit the query below as needed
SELECT hours_studied, AVG(exam_score) AS avg_exam_score
FROM student_performance
WHERE (hours_studied > 10) AND (extracurricular_activities = 'Yes')
GROUP BY hours_studied
ORDER BY hours_studied DESC;
Hidden output
Spinner
DataFrameas
avg_exam_score_by_hours_studied_range
variable
-- avg_exam_score_by_hours_studied_range
-- Add solution code below
SELECT
	CASE
		WHEN (hours_studied BETWEEN 1 AND 5) THEN '1-5 hours'
		WHEN (hours_studied BETWEEN 6 AND 10) THEN '6-10 hours'
		WHEN (hours_studied BETWEEN 11 AND 15) THEN '11-15 hours'
		ELSE '16+ hours'
	END AS hours_studied_range, AVG(exam_score) AS avg_exam_score,
	AVG(sleep_hours) AS avg_sleep_hours, AVG(tutoring_sessions) AS avg_tutor_sessions
FROM public.student_performance
GROUP BY hours_studied_range
ORDER BY avg_exam_score DESC;
Spinner
DataFrameas
student_exam_ranking
variable
-- student_exam_ranking
-- Add solution code below 
SELECT attendance, hours_studied, sleep_hours, tutoring_sessions, 
	DENSE_RANK() OVER(ORDER BY exam_score DESC) AS exam_rank
FROM public.student_performance
ORDER BY exam_rank ASC
LIMIT 30;
Hidden output
Spinner
DataFrameas
df2
variable
SELECT
	CASE
		WHEN (exam_score BETWEEN 55 AND 59) THEN '55-59'
		WHEN (exam_score BETWEEN 60 AND 64) THEN '60-64'
		WHEN (exam_score BETWEEN 65 AND 69) THEN '65-69'
		ELSE '70+'
	END AS score_range, AVG(tutoring_sessions) AS avg_tutor_sessions,
	AVG(attendance) AS avg_attend_pct
FROM public.student_performance
GROUP BY score_range
ORDER BY avg_tutor_sessions DESC