Skip to content
NYC Public School Test Result Scores
NYC Public School Test Result Scores
Project Description
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
This project uses schools.csv dataset previewed below to explore New York City (NYC) public school SAT performance.
Analysis Conduction
Preview of the dataset
import pandas as pd
# Read and preview the data
schools = pd.read_csv("schools.csv")
schools.info()
schools.head()NYC schools with the best math results
best_math_schools = schools[schools['average_math'] >= 800*0.8][['school_name', 'borough', 'average_math']].sort_values('average_math', ascending=False)
best_math_schoolsTop 10 performing schools based on the combined SAT scores
schools['total_SAT'] = schools['average_math'] + schools['average_reading'] + schools['average_writing']
top_10_schools = schools[['school_name', 'borough', 'total_SAT']].sort_values('total_SAT', ascending=False).head(10)
top_10_schoolsSingle borough with the largest standard deviation in the combined SAT score
borough_stats = schools.groupby('borough').agg(
num_schools=('school_name', 'count'),
average_SAT=('total_SAT', 'mean'),
std_SAT=('total_SAT', 'std')
).reset_index()
# Find the borough with the largest std_SAT
largest_std_dev = borough_stats.sort_values('std_SAT', ascending=False).head(1)
# Round SAT scores to two decimal places
largest_std_dev['average_SAT'] = largest_std_dev['average_SAT'].round(2)
largest_std_dev['std_SAT'] = largest_std_dev['std_SAT'].round(2)
largest_std_dev = largest_std_dev[['borough', 'std_SAT', 'num_schools', 'average_SAT']]
largest_std_devRating of boroughs by the average SAT score
import matplotlib.pyplot as plt
# Sort boroughs by average_SAT for better visualization
borough_stats_sorted = borough_stats.sort_values('average_SAT', ascending=True)
plt.figure(figsize=(8, 5))
plt.barh(borough_stats_sorted['borough'], borough_stats_sorted['average_SAT'], color='skyblue')
plt.xlabel('Average SAT Score')
plt.ylabel('Borough')
plt.title('Average SAT Score by Borough')
plt.tight_layout()
plt.show()Distribution of total SAT scores by borough
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
schools.boxplot(column='total_SAT', by='borough', vert=False, grid=False)
plt.suptitle('Distribution of Total SAT Scores by Borough')
plt.xlabel('Total SAT Score')
plt.ylabel('Borough')
plt.tight_layout()
plt.show()