Analyzing school preparation exam influence.

Analyzing exam scores

Now let's now move on to the competition and challenge.

📖 Background

Your best friend is an administrator at a large school. The school makes every student take year-end math, reading, and writing exams.

Since you have recently learned data manipulation and visualization, you suggest helping your friend analyze the score results. The school's principal wants to know if test preparation courses are helpful. She also wants to explore the effect of parental education level on test scores.

💾 The data

The file has the following fields (source):

"gender" - male / female
"race/ethnicity" - one of 5 combinations of race/ethnicity
"parent_education_level" - highest education level of either parent
"lunch" - whether the student receives free/reduced or standard lunch
"test_prep_course" - whether the student took the test preparation course
"math" - exam score in math
"reading" - exam score in reading
"writing" - exam score in writing

# Importing modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


# Reading in the data
df = pd.read_csv('data/exams.csv')

# Take a look at the first datapoints
df.head()

💪 Challenge

Create a report to answer the principal's questions. Include:

What are the average reading scores for students with/without the test preparation course?
What are the average scores for the different parental education levels?
Create plots to visualize findings for questions 1 and 2.
[Optional] Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
[Optional 2] The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
Summarize your findings.

1. What are the average reading scores for students with/without the test preparation course?

df.groupby('test_prep_course').describe()

avg_score = df.groupby('test_prep_course').mean().sort_values(by = 'reading').reset_index()
avg_score = avg_score.melt(id_vars=['test_prep_course'], value_vars=['math', 'reading', 'writing'])

display(avg_score)

#Setting Plot theme
custom_params = {"axes.spines.right": False, "axes.spines.top": False}
sns.set_theme(style="ticks", rc=custom_params ,  palette="colorblind", font_scale=1.3)


# Grid plot with all diferent test scores
avg_plot = sns.FacetGrid(avg_score, col="variable", hue="test_prep_course", height=7, aspect=.6)
avg_plot = avg_plot.map(sns.barplot, "test_prep_course", "value", order=["none", "completed"])

# Adding axis labels
avg_plot.set_axis_labels("Status of the test preparation course", "Average score")

# Adding title to the figure
avg_plot.fig.subplots_adjust(top=0.9)
avg_plot.fig.suptitle('Average scores for students with/without the test preparation course')

The average reading scores drops from 73.9 to 66.5 with students that doesn't take the test preparation course.

lunch_score = df.groupby(['test_prep_course', 'lunch']).mean().sort_values(by = 'reading').reset_index()
lunch_score = lunch_score.melt(id_vars= ['test_prep_course', 'lunch'], value_vars= ['math', 'reading', 'writing'])

display(lunch_score)

# Grid plot with all diferent test scores
lunch_score = sns.FacetGrid(lunch_score, col="lunch", row="variable",hue="test_prep_course", height=7, aspect=.6)
lunch_score = lunch_score.map(sns.barplot, "test_prep_course", "value", order=["none", "completed"])

# Adding axis labels
lunch_score.set_axis_labels("Status of the test preparation course", "Average score")

# Adding title to the figure
lunch_score.fig.subplots_adjust(top=0.9)
lunch_score.fig.suptitle('Average scores for students with/without the test preparation course and type of lunch')

grouping the data by type of lunch and by the status of the preparatory test, we can see that students who had a reduced lunch had lower scores than students who had a normal lunch.

gender_score = df.groupby(['test_prep_course', 'gender']).mean().sort_values(by = 'reading').reset_index()
gender_score = gender_score.melt(id_vars= ['test_prep_course', 'gender'], value_vars= ['math', 'reading', 'writing'])

display(gender_score)

# Grid plot with all diferent test scores
gender_plot = sns.FacetGrid(gender_score, col="gender", row="variable",hue="test_prep_course", height=7, aspect=.6)
gender_plot = gender_plot.map(sns.barplot, "test_prep_course", "value", order=["none", "completed"])

# Adding axis labels
gender_plot.set_axis_labels("Status of the test preparation course", "Average score")

# Adding title to the figure
gender_plot.fig.subplots_adjust(top=0.9)
gender_plot.fig.suptitle('Average scores for students with/without the test preparation course and gender')

grouping the data by gender and prep test status, we found that male students generally scored better in mathematics, while female students scored better in reading and writing.