Skip to content

1 hidden cell

Analyzing NYC Public School SAT Performance ๐ŸŽ“๐Ÿ“Š

Executive Summary

This project analyzes SAT performance across New York City public high schools, focusing on reading, math, and writing scores. Using Python and pandas, the analysis identifies schools excelling in mathematics, the top overall performers by total SAT score, and the borough with the greatest variation in results. The findings help stakeholders โ€” from policymakers to parents โ€” understand academic disparities and highlight schools that consistently achieve strong outcomes.

Business Problem

Academic performance insights are crucial for driving improvements in education quality and ensuring equitable opportunities for students.
New York Cityโ€™s Department of Education aims to identify:

  • Which schools achieve the highest math scores.
  • The top 10 overall performing schools based on total SAT results.
  • The borough with the widest variation in scores, indicating possible inequality or differing resource availability.

By exploring these questions, this project provides a data-driven foundation for educational policy and community decision-making.

Methodology

Data Source
Analyzed schools.csv, containing school names, boroughs, and average SAT scores in reading, writing, and math.

Data Preparation
Loaded and inspected the dataset using pandas and numpy.
Created a new column total_SAT = sum of all three section averages per school.

Analytical Steps
Q1: Top Math Schools: Identified schools with average math scores above 640.
Q2: Top 10 Schools Overall: Ranked schools by total SAT score to find the overall top 10.
Q3: Borough with Largest Score Variation: Measured standard deviation across boroughs to determine where performance was most variable.

Skills

Python
Libraries: pandas, numpy
Key Functions: .groupby(), .agg(), .sort_values(), .loc[], .sum(axis=1)
Data cleaning, aggregation, and summary statistics for education analytics.

Results

Top Math Schools: Identified multiple NYC schools achieving exceptional math averages (โ‰ฅ640), showcasing strong STEM programs.
Top 10 Overall: The highest-performing schools across NYC displayed well-rounded excellence, with top SAT scores averaging 1340.13 in Manhattan, 1345.48 in Queens, and 1439.00 in Staten Island.
Borough Variation: Manhattan had the highest standard deviation in SAT scores (230.29), indicating the greatest disparity in educational quality or resource distribution.

Recommendations:

Target support for lower-performing schools in high-variance boroughs to close achievement gaps.
Replicate best practices from top-performing schools citywide, especially in math and literacy programs.
Use these insights to inform resource allocation and parental school selection decisions.

Next Steps

Visualize results through bar charts and boxplots to better communicate performance differences.
Enrich the dataset with socioeconomic or demographic data to analyze performance drivers.
Conduct time-series analysis on multi-year SAT data to track improvement trends and policy impact.

๐Ÿ”๐Ÿ“ˆ๐Ÿ“Š Analysis

# Re-run this cell 
import pandas as pd
import numpy as np
# Read in the data
schools = pd.read_csv("schools.csv")

# Preview the data
#schools.head()


#Which NYC schools have the best math results?
math_schools = schools.loc[:, ['school_name', 'average_math']].sort_values('average_math', ascending = False)
print(math_schools)

best_math_schools = math_schools[math_schools['average_math'] >= 640]
print(best_math_schools)


#What are the top 10 performing schools based on the combined SAT scores?
schools['total_SAT'] = schools[['average_math', 'average_reading', 'average_writing']].sum(axis=1)
schools_ordered_SAT= schools[['school_name', 'total_SAT']].sort_values('total_SAT', ascending = False)
top_10_schools= schools_ordered_SAT.iloc[0:10, :]
print(top_10_schools)

#Which single borough has the largest standard deviation in the combined SAT score?
boroughs = schools.groupby('borough')['total_SAT'].agg(['count', 'mean', 'std']).round(2)
print(boroughs)

largest_std_dev = boroughs[boroughs['std'] == boroughs['std'].max()]
print(largest_std_dev)

largest_std_dev = largest_std_dev.rename(columns={"count": "num_schools", "mean": "average_SAT", "std": "std_SAT"})
print(largest_std_dev)