Skip to content
0

â„šī¸ Exam Scores Descriptive Analysis

💾 The data

The file has the following fields (source):
  • "gender" - male / female
  • "race/ethnicity" - one of 5 combinations of race/ethnicity
  • "parent_education_level" - highest education level of either parent
  • "lunch" - whether the student receives free/reduced or standard lunch
  • "test_prep_course" - whether the student took the test preparation course
  • "math" - exam score in math
  • "reading" - exam score in reading
  • "writing" - exam score in writing

Contents

  1. Import necessary libraries and load the dataset
  2. What are the average reading scores for students with/without the test preparation course?
  3. What are the average scores for the different parental education levels?
  4. Create plots to visualize findings for questions 1 and 2.
  5. Look at the effects within subgroups. Compare the average scores for students with/without the test reparation course for different parental education levels (e.g., faceted plots).
  6. The principal wants to know if kids who perform well on one subject also score well on the others. Look at the correlations between scores.
  7. Summary.

1. Import necessary libraries and load the dataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px


import warnings
warnings.simplefilter("ignore")
df = pd.read_csv('data/exams.csv')
df.head()

2. [Ques 1] What are the average reading scores for students with/without the test preparation course?

df.groupby('test_prep_course')[['reading']].mean()

3. [Ques 2] What are the average scores for the different parental education levels?

df.groupby('parent_education_level')[['math','reading','writing']].mean()

4. Create plots to visualize findings for questions 1 and 2

sns.catplot(x='test_prep_course', y='reading', data=df, kind='bar').set(title='average reading scores for students with/without the test preparation course')
sns.catplot(x='parent_education_level', y='math', data=df, kind='bar').set(title='average scores in math')
plt.xticks(rotation=90)

sns.catplot(x='parent_education_level', y='reading', data=df, kind='bar').set(title='average scores in reading')
plt.xticks(rotation=90)

sns.catplot(x='parent_education_level', y='writing', data=df, kind='bar').set(title='average scores in writing')
plt.xticks(rotation=90)

5. Look at the effects within subgroups. Compare the average scores for students with/without the test preparation course for different parental education levels

‌
‌
‌