Photo by Jannis Lucas on Unsplash.
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
# Re-run this cell
import pandas as pd
# Read in the data
schools = pd.read_csv("schools.csv")
# Preview the data
schools.head()
# Start coding here...
# Add as many cells as you like...# Question: Find the schools with the best math scores; 80% out of a possible 800
# Calculate the threshold for "best" math scores: 80% of the maximum possible SAT math score (800)
math_score_threshold = 0.8 * 800 # 80% of 800 is 640
# Filter the schools DataFrame to include only schools with average math scores >= 640
high_math_score_schools = schools[schools["average_math"] >= math_score_threshold]
# Select only the school name and average math score columns, and sort by average math score in descending order
top_math_schools = high_math_score_schools[["school_name", "average_math"]].sort_values(
by="average_math", ascending=False
)
# Display the resulting DataFrame of top math schools
top_math_schools# Question: Top 10 performing schools based on combined SAT scores
# Calculate the combined SAT score for each school by summing the average scores for math, reading, and writing
schools["total_SAT"] = schools["average_math"] + schools["average_reading"] + schools["average_writing"]
# Sort the schools DataFrame in descending order based on the total SAT score
schools_sorted_by_total_SAT = schools.sort_values("total_SAT", ascending=False)
# Select only the 'school_name' and 'total_SAT' columns for the top 10 schools with the highest total SAT scores
top_10_schools_by_total_SAT = schools_sorted_by_total_SAT[["school_name", "total_SAT"]].head(10)
# Display the top 10 schools with the highest combined SAT scores
top_10_schools_by_total_SAT# Question: Which single borough has the largest standard deviation in the combined SAT score?
# Group the schools DataFrame by 'borough' and calculate the count, mean, and standard deviation of total SAT scores for each borough
borough_sat_stats = schools.groupby('borough')["total_SAT"].agg(["count", "mean", "std"]).round(2)
# Find the borough with the highest standard deviation in combined SAT scores
borough_with_largest_std = borough_sat_stats[borough_sat_stats["std"] == borough_sat_stats["std"].max()]
# Rename the columns for clarity
borough_with_largest_std.columns = ["num_schools", "average_SAT", "std_SAT"]
# Display the borough with the largest standard deviation in combined SAT scores
borough_with_largest_std