Skip to content
0

Analyzing exam scores

Now let's now move on to the competition and challenge.

📖 Background

Your best friend is an administrator at a large school. The school makes every student take year-end math, reading, and writing exams.

Since you have recently learned data manipulation and visualization, you suggest helping your friend analyze the score results. The school's principal wants to know if test preparation courses are helpful. She also wants to explore the effect of parental education level on test scores.

💾 The data

The file has the following fields (source):
  • "gender" - male / female
  • "race/ethnicity" - one of 5 combinations of race/ethnicity
  • "parent_education_level" - highest education level of either parent
  • "lunch" - whether the student receives free/reduced or standard lunch
  • "test_prep_course" - whether the student took the test preparation course
  • "math" - exam score in math
  • "reading" - exam score in reading
  • "writing" - exam score in writing
head(df)

💪 Challenge

Create a report to answer the principal's questions. Include:

  1. What are the average reading scores for students with/without the test preparation course?
  2. What are the average scores for the different parental education levels?
  3. Create plots to visualize findings for questions 1 and 2.
  4. [Optional] Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
  5. [Optional 2] The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
  6. Summarize your findings.

Data Preparation

# Loading packages
library(tidyverse)

# Preparing the data
df <- readr::read_csv('data/exams.csv', show_col_types = FALSE)

new_df <- df %>%
mutate(across(1:5, factor))

head(new_df)

Data Analysis

What are the average reading scores for students with/without the test preparation course?

Students who have completed the preparatory course tend to perform better at reading than students who failed to complete it. The average reading score for students who have taken the test is 73.9 which is 7.4 points higher than for score of students who haven't completed the course.

avg_reading_by_prep <- new_df %>%
    group_by(test_prep_course) %>%
    summarise(avg_read_score = round(mean(reading), 2)) %>%
    print()

# Visualizing average reading scores for students with/without the test preparation course

ggplot(avg_reading_by_prep, aes(test_prep_course, avg_read_score, fill = test_prep_course)) +
    geom_col()

What are the average scores for the different parental education levels?

Considering the 6 levels of parental education, students whose parents' highest level of education is a master degree perform overall better that all other categories. Students with parents having a bachelors degree come next with overall performance which is just slightly lower than that of master degree category.

avg_scores_by_parent_educ_level <- new_df %>%
    select(parent_education_level, math, reading, writing) %>%
    group_by(parent_education_level) %>%
    summarise(avg_math_score = mean(math), avg_read_score = mean(reading), avg_writing_score = mean(writing)) %>%
    print()
# Visualizing average scores for each parent education level

avg_scores_by_parent_educ_level %>%
    pivot_longer(-parent_education_level, names_to = "category", values_to = "avg_scores") %>%
    ggplot(aes(category, avg_scores, fill = parent_education_level)) +
        geom_col() +
        facet_wrap(~parent_education_level, ncol = 2) +
        theme(legend.position="none")

Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels..

Investigating the effect of test preparation on average scores of students, grouped by their different parent education levels shows that regardless of the parent education background, students taking the preparatory course always score higher on average in reading, writing and maths than students with no preparatory course taken.

# Comparing the average scores for students with/without the test preparation course for different parental education levels

new_df %>%
    select(parent_education_level, test_prep_course, math, reading, writing) %>%
    group_by(parent_education_level, test_prep_course) %>%
    summarise(avg_math_score = mean(math), avg_read_score = mean(reading), avg_writing_score = mean(writing)) %>%
    pivot_longer(-c(parent_education_level, test_prep_course), names_to = "category", values_to = "avg_scores") %>%
    ggplot(aes(category, avg_scores, fill = test_prep_course)) +
        geom_col() +
        facet_grid(parent_education_level~test_prep_course) +
        theme(legend.position="none")

The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores...

The question is to know if good performance in one subject means the same for other subjects. Well, the answer to this is a yes. Performance in all three subjects is highly associated. With a correlation above 0.8 for all three subject, we can predict that performing well in one of the subjects also means performing well in other subjects.

# Investigating correlations between scores

new_df %>%
select(math, reading, writing) %>%
cor()

Summary of findings

  1. Students who complete the test preparation score on average higher that students who don't.
  2. Students whose parent's highest level of education is a master degree achieve higher scores in all subjects that student's whose parents have a lower level of education.
  3. Completing the test preparation course increases the final score of students all students regardless of their parent's educational background
  4. The correlation between all three scores are above 0.8, indicating that there is a high level of association between scores in all three subject. In other words, students performing well in 1 subject also perform well in the other 2.
‌
‌
‌