Skip to content
0

Analyzing exam scores

Now let's now move on to the competition and challenge.

πŸ“– Background

Your best friend is an administrator at a large school. The school makes every student take year-end math, reading, and writing exams.

Since you have recently learned data manipulation and visualization, you suggest helping your friend analyze the score results. The school's principal wants to know if test preparation courses are helpful. She also wants to explore the effect of parental education level on test scores.

πŸ’Ύ The data

The file has the following fields (source):
  • "gender" - male / female
  • "race/ethnicity" - one of 5 combinations of race/ethnicity
  • "parent_education_level" - highest education level of either parent
  • "lunch" - whether the student receives free/reduced or standard lunch
  • "test_prep_course" - whether the student took the test preparation course
  • "math" - exam score in math
  • "reading" - exam score in reading
  • "writing" - exam score in writing

πŸ’ͺ Challenge

Create a report to answer the principal's questions. Include:

  1. What are the average reading scores for students with/without the test preparation course?
  2. What are the average scores for the different parental education levels?
  3. Create plots to visualize findings for questions 1 and 2.
  4. [Optional] Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels (e.g., faceted plots).
  5. [Optional 2] The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
  6. Summarize your findings.

Import lab and see data

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


df=pd.read_csv("data/exams.csv")
df.head()

See Dtypes and Check Null Values

#See dtypes
df.dtypes
#Check Null Values
print(pd.isnull(df).sum())

βœ… What are the average reading scores for students with/without the test preparation course?

df_avgTestprepcourse=df.groupby(['test_prep_course','gender'],as_index=False)['reading'].mean()
print(df_avgTestprepcourse)
# Bar chart comparison
sns.catplot(data=df_avgTestprepcourse,kind="bar",y="test_prep_course",x="reading",col="gender")

βœ… What are the average scores for the different parental education levels?

df_avgParentalEd=df.groupby(['parent_education_level'],as_index=False)['math','reading','writing'].mean()
print(df_avgParentalEd['parent_education_level'].unique())
print(df_avgParentalEd)

# Bar chart comparison
fig,ax=plt.subplots()
ax.bar(df_avgParentalEd['parent_education_level'],df_avgParentalEd['writing'],label='Writing')
ax.bar(df_avgParentalEd['parent_education_level'],df_avgParentalEd['reading'],label='Reading')
ax.bar(df_avgParentalEd['parent_education_level'],df_avgParentalEd.math,label='Math')
ax.set_xticklabels(df_avgParentalEd['parent_education_level'],rotation=90)
ax.set_ylim([61,80])
ax.set_ylabel("Avarage Scores")
ax.legend()
fig.set_size_inches(18.5, 10.5)
fig.savefig('test2png.png', dpi=100)

#Those with 'Parent Education Level' High School have clear differences in the average score by Master's degree.

βœ… Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels.

  • It was observed that men achieved higher scores than women in the 'Math' Results in all 'Parent Education Level' variables and scored lower in other exams
df_avgParentalEdAndTest=df.groupby(['parent_education_level','test_prep_course','gender'],as_index=False)['math','reading','writing'].mean()
print(df_avgParentalEdAndTest)
def graph(df,x,y,col,hue):
    fig=plt.figure()
    fig.suptitle("Parent Education Level")
    fig.set_size_inches(18.5, 10.5)
    for j,wo in enumerate(y,start=1):
        for n,val in enumerate(df[col].unique(),start=0):
            c=plt.subplot(3,6,(6*j)-n)
            plt.ylim([55,85])
            
            f=sns.barplot(df.loc[df[col]==val,x],df.loc[df[col]==val,wo],data=df,hue=hue)
            if j==3:
                f.set(xticks=[0,1])
                f.set_xticklabels(["Completed","None"])
            else:
                f.set(xticks=[0,1])
                f.set_xticklabels([" "," "])
            if j==1:
                f.set_title(val)
            else: f.set_title("")
            f.set(xlabel='')
            if n==5:
                f.set(ylabel=wo)
            else: 
                f.set(ylabel='')
            legend=f.get_legend()
            handles=legend.legendHandles
            legend.remove()
    fig.legend(handles,['Female','Male'],loc='upper right',ncol=len(df[hue].unique()),bbox_transform=fig.transFigure,labelcolor=["#3274a1","#e1812c"])
                
    plt.show()
graph(df_avgParentalEdAndTest,'test_prep_course',['math','reading','writing'],'parent_education_level','gender')

βœ… The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.

  • According to the correlation, it is seen that the 'writing' and 'reading' columns are directly related to each other.
β€Œ
β€Œ
β€Œ