Skip to content
0

1 hidden cell

Analyzing exam scores

Now let's now move on to the competition and challenge.

๐Ÿ“– Background

Your best friend is an administrator at a large school. The school makes every student take year-end math, reading, and writing exams.

Since you have recently learned data manipulation and visualization, you suggest helping your friend analyze the score results. The school's principal wants to know if test preparation courses are helpful. She also wants to explore the effect of parental education level on test scores.

๐Ÿ’พ The data

The file has the following fields (source):
  • "gender" - male / female
  • "race/ethnicity" - one of 5 combinations of race/ethnicity
  • "parent_education_level" - highest education level of either parent
  • "lunch" - whether the student receives free/reduced or standard lunch
  • "test_prep_course" - whether the student took the test preparation course
  • "math" - exam score in math
  • "reading" - exam score in reading
  • "writing" - exam score in writing
head(df)

๐Ÿ’ช Challenge

Create a report to answer the principal's questions. Include:

  1. What are the average reading scores for students with/without the test preparation course?
  2. What are the average scores for the different parental education levels?
  3. Create plots to visualize findings for questions 1 and 2.
  4. [Optional] Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
  5. [Optional 2] The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
  6. Summarize your findings.

โŒ›๏ธ Time is ticking. Good luck!

Summary of Findings

Effect of Test Prepartation Course

The effect of the test preparation course resulted in an increase of the average reading score by 7 points when compared to students who did not take the test preparation course.

# Average reading scores for students with/without the test prep course
df %>%
	group_by(test_prep_course) %>%
	summarise(avg_reading_score = round(mean(reading),2)) %>%
	arrange(avg_reading_score)
#Aveage Reading score based on test prep course
df %>%
  group_by(test_prep_course) %>%
  summarise(avg_reading_score = round(mean(reading),2)) %>%
  ggplot(aes(test_prep_course, avg_reading_score, fill = test_prep_course)) +
  geom_col()+
  labs(x = "Test Prep Course",
	  y = "Avg Reading Score",
	  fill = "Test Prep Course")+
  scale_fill_discrete(labels = c("With Test Prep", "Without Test Prep"))

Average Scores Based on Parental Education Levels

Looking at the background of the education level of the student's parents, there is a trend that students whose parents have had at least some college education shows an increase in the average math, reading, and writing scores. However, the student's whose parents only had some high school education had a higher average score compared to student's whose parents completed high school.

#Average scores based on different parental education levels
df %>%
  group_by(parent_education_level) %>%
  summarise(avg_math_score = round(mean(math), 2),
            avg_reading_score = round(mean(reading), 2),
            avg_writing_score = round(mean(writing), 2))
avg_score_parent<- df %>%
	group_by(parent_education_level) %>%
	summarise(avg_math_score = round(mean(math), 2),
            avg_reading_score = round(mean(reading), 2),
            avg_writing_score = round(mean(writing), 2)) %>%
	pivot_longer(cols = c('avg_math_score', 'avg_reading_score', 'avg_writing_score'),
				 names_to = 'category',
				 values_to = 'avg_scores')
#Average score based on Parent Education Level
avg_score_parent %>%
  ggplot(aes(x = factor(parent_education_level, levels = c("some high school", "high school", "some college", "associate's degree", "bachelor's degree", "master's degree")), y = avg_scores, fill = category)) +
  geom_col(position = "dodge") +
  labs(x = "Parent Education Level", y = "Avg Scores", fill = "Subject") +
  scale_fill_discrete(labels = c("Avg Math Score", "Avg Reading Score", "Avg Writing Score")) +
  theme(axis.text.x = element_text(angle = 90))

Comparing average scores for students with/without the test preparation course based on different parental education levels

Looking deeper in the data by averaging student's scores based on not only their parent's education background but also if they took the test preparation course shows a bigger point difference if students took the test preparation course. The trend remains that the average score was higher if their parents came from an educated background.

parent_prep_score<- df %>%
	group_by(parent_education_level, test_prep_course) %>%
	summarise(avg_math_score = round(mean(math), 2),
            avg_reading_score = round(mean(reading), 2),
            avg_writing_score = round(mean(writing), 2)) %>%
	pivot_longer(cols = c('avg_math_score', 'avg_reading_score', 'avg_writing_score'),
				 names_to = 'category',
				 values_to = 'avg_scores')
#Average score based on Parental education level separated by test prep course
parent_prep_score %>%
	ggplot(aes(x = factor(parent_education_level, levels = c("some high school", "high school", "some college", "associate's degree", "bachelor's degree", "master's degree")), y = avg_scores, fill = category))+
	geom_col(position = "dodge")+
	facet_grid(~test_prep_course)+
	labs(x = "Parent Education Level", 
		 y = "Avg Scores", 
		 fill = "Subject")+
	scale_fill_discrete(labels = c("Avg Math Score", "Avg Reading Score", "Avg Writing Score"))+
	theme(axis.text.x = element_text(angle = 90))
โ€Œ
โ€Œ
โ€Œ