Photo by Jannis Lucas on Unsplash.
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv
, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
# Re-run this cell
import pandas as pd
# Read in the data
schools = pd.read_csv("schools.csv")
# Preview the data
schools.head()
# Start coding here...
# Add as many cells as you like...
Which NYC schools have the best math results?
The best math results are at least 80% of the maximum possible score of 800 for math. Save your results in a pandas DataFrame called best_math_schools, including "school_name" and "average_math" columns, sorted by "average_math" in descending order.
# Finding best score
best_math_schools = schools[schools["average_math"] > 800 * 0.8][["school_name", "average_math"]].sort_values("average_math", ascending=False)
What are the top 10 performing schools based on the combined SAT scores?
Save your results as a pandas DataFrame called top_10_schools containing the "school_name" and a new column named "total_SAT", with results ordered by "total_SAT" in descending order ("total_SAT" being the sum of math, reading, and writing scores).
#Adding new column
schools["total_SAT"] = schools["average_math"]+schools["average_reading"]+schools["average_writing"]
top_schools = schools.sort_values("total_SAT", ascending = False).iloc[0:10]
top_10_schools = top_schools[["school_name","total_SAT"]]
print(top_10_schools)
Which single borough has the largest standard deviation in the combined SAT score?
Save your results as a pandas DataFrame called largest_std_dev.
The DataFrame should contain one row, with: -"borough" - the name of the NYC borough with the largest standard deviation of "total_SAT". -"num_schools" - the number of schools in the borough. -"average_SAT" - the mean of "total_SAT". -"std_SAT" - the standard deviation of "total_SAT".
Round all numeric values to two decimal places.
# Calculation of standard deviation and mean
schools_grouped = schools.groupby('borough')["total_SAT"].agg(["std", "mean", "count"]).round(2).sort_values("std", ascending=False)
# Renaming columns
schools_grouped.rename(columns={"std": "std_dev", "mean": "average_SAT", "count": "num_schools"}, inplace=True)
# Adding the std_SAT column
schools_grouped["std_SAT"] = schools_grouped["std_dev"]
# Creating the object largest_std_dev with a single row
largest_std_dev = schools_grouped.head(1)[["num_schools", "average_SAT", "std_dev", "std_SAT"]]
print(largest_std_dev)