Skip to content

NYC Public School Test Result Scores

Project Description

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.

This project uses schools.csv dataset previewed below to explore New York City (NYC) public school SAT performance.

Analysis Conduction

Preview of the dataset

import pandas as pd

# Read and preview the data
schools = pd.read_csv("schools.csv")

schools.info()
schools.head()

NYC schools with the best math results

best_math_schools = schools[schools['average_math'] >= 800*0.8][['school_name', 'borough', 'average_math']].sort_values('average_math', ascending=False)

best_math_schools

Top 10 performing schools based on the combined SAT scores

schools['total_SAT'] = schools['average_math'] + schools['average_reading'] + schools['average_writing']
top_10_schools = schools[['school_name', 'borough', 'total_SAT']].sort_values('total_SAT', ascending=False).head(10)

top_10_schools

Single borough with the largest standard deviation in the combined SAT score

borough_stats = schools.groupby('borough').agg(
    num_schools=('school_name', 'count'),
    average_SAT=('total_SAT', 'mean'),
    std_SAT=('total_SAT', 'std')
).reset_index()

# Find the borough with the largest std_SAT
largest_std_dev = borough_stats.sort_values('std_SAT', ascending=False).head(1)

# Round SAT scores to two decimal places
largest_std_dev['average_SAT'] = largest_std_dev['average_SAT'].round(2)
largest_std_dev['std_SAT'] = largest_std_dev['std_SAT'].round(2)

largest_std_dev = largest_std_dev[['borough', 'std_SAT', 'num_schools', 'average_SAT']]

largest_std_dev

Rating of boroughs by the average SAT score

import matplotlib.pyplot as plt

# Sort boroughs by average_SAT for better visualization
borough_stats_sorted = borough_stats.sort_values('average_SAT', ascending=True)

plt.figure(figsize=(8, 5))
plt.barh(borough_stats_sorted['borough'], borough_stats_sorted['average_SAT'], color='skyblue')
plt.xlabel('Average SAT Score')
plt.ylabel('Borough')
plt.title('Average SAT Score by Borough')
plt.tight_layout()
plt.show()

Distribution of total SAT scores by borough

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
schools.boxplot(column='total_SAT', by='borough', vert=False, grid=False)
plt.suptitle('Distribution of Total SAT Scores by Borough')
plt.xlabel('Total SAT Score')
plt.ylabel('Borough')
plt.tight_layout()
plt.show()