Skip to content

In today's fast-paced and competitive educational environment, understanding the factors that influence student success is more important than ever. Just like the transport system in a bustling city like London must adapt to serve its residents, schools and educators must adapt to meet the needs of students. In this project, we will take a deep dive into a dataset containing rich details about various aspects of student life, such as hours studied, sleep patterns, attendance, and more, to uncover what truly impacts exam performance.

The dataset we'll be working with includes a wide range of factors influencing student performance. By analyzing this data, we'll be able to identify key drivers of success and provide insights that could help students, teachers, and policymakers make informed decisions. The table we'll use for this project is called student_performance and includes the following data:

ColumnDefinitionData type
attendancePercentage of classes attendedfloat
extracurricular_activitiesParticipation in extracurricular activitiesvarchar (Yes, No)
sleep_hoursAverage number of hours of sleep per nightfloat
tutoring_sessionsNumber of tutoring sessions attended per monthinteger
teacher_qualityQuality of the teachersvarchar (Low, Medium, High)
exam_scoreFinal exam scorefloat

You will execute SQL queries to answer three questions, as listed in the instructions.

Do more study hours and extracurricular activities lead to better scores?

Analyze how studying more than 10 hours per week, while also participating in extracurricular activities, impacts exam performance. The output should include two columns: 1) hours_studied and; 2) avg_exam_score.

Group and sort the results by hours_studied in descending order. Save the query as avg_exam_score_by_study_and_extracurricular.

Spinner
DataFrameas
avg_exam_score_by_study_and_extracurricular
variable
-- avg_exam_score_by_study_and_extracurricular
-- Edit the query below as needed
SELECT hours_studied, AVG(exam_score) AS avg_exam_score
FROM student_performance
WHERE extracurricular_activities = 'Yes'
GROUP BY hours_studied
ORDER BY hours_studied DESC
LIMIT 30;
Hidden output
indexhours_studiedavg_exam_score
04378
13975
23873.5
33773
43670.4285714286
53572.3125
63471.1875
73370.3333333333
83271.325
93170.5531914894
103071.4328358209
112970.256097561
122869.8256880734
132769.7768595041
142668.801369863
152569
162468.4838709677
172368.2074688797
182267.7215189873
192167.7104247104
202067.2186379928
211967.2117647059
221867.024
231766.401826484
241666.5860465116
251565.9010989011
261465.8622754491
271364.9761904762
281264.9661016949
291165.1458333333

Is there a sweet spot for study hours? Explore how different ranges of study hours impact exam performance by calculating the average exam score for each study range.

Categorize students into four groups based on hours studied per week: 1-5 hours, 6-10 hours, 11-15 hours, and 16+ hours. The output should contain two columns: 1) hours_studied_range and 2) avg_exam_score.

Group the results by hours_studied_range and sort them by avg_exam_score in descending order. Save the query as avg_exam_score_by_hours_studied_range.

Spinner
DataFrameas
avg_exam_score_by_hours_studied_range
variable
-- avg_exam_score_by_hours_studied_range
-- Add solution code below 
SELECT
	CASE
		WHEN hours_studied < 1 THEN NULL
		WHEN hours_studied >= 1 AND hours_studied <= 5 THEN '1-5 hours'
		WHEN hours_studied >= 6 AND hours_studied <= 10 THEN '6-10 hours'
		WHEN hours_studied >= 11 AND hours_studied <= 15 THEN '11-15 hours'
		ELSE '16+ hours'
	END AS hours_studied_range, AVG(exam_score) AS avg_exam_score
FROM student_performance
GROUP BY hours_studied_range
ORDER BY avg_exam_score DESC;
Hidden output
indexhours_studied_rangeavg_exam_score
016+ hours67.9233633869
111-15 hours65.2043859649
26-10 hours64.2254901961
31-5 hours62.6271186441

A teacher wants to show their students their relative rank in the class, without revealing their exam scores to each other.

Use a window function to assign ranks based on exam_score, ensuring that students with the same exam score share the same rank and no ranks are skipped. Return the columns attendance, hours_studied, sleep_hours, tutoring_sessions, and exam_rank. The students with the highest exam score should be at the top of the results, so order your query by exam_rank in ascending order. Limit your query to 30 students.

Spinner
DataFrameas
student_exam_ranking
variable
-- student_exam_ranking
-- paste code into third SQL cell 
SELECT attendance, hours_studied, sleep_hours, tutoring_sessions,
	DENSE_RANK() OVER(ORDER BY exam_score DESC) AS exam_rank
FROM student_performance
ORDER BY exam_rank
LIMIT 30;
Hidden output
indexattendancehours_studiedsleep_hourstutoring_sessionsexam_rank
09827651
18918432
29014843
38323413
49628414
59028904
68316824
78315725
87421615
99925705
109318726
119018617
126721617
139624628
149825718
157629828
169019508
176926709
187325729
198814310
2069317210
2170219111
2277146112
2392219212
2470197012
2586188213
2698309113
2762117313
2871118114
296978114