Photo by Jannis Lucas on Unsplash.
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
# Re-run this cell
import pandas as pd
# Read in the data
schools = pd.read_csv("schools.csv")
# Preview the data
schools.head()Which NYC schools have the best math results?
- The best math results are at least 80% of the maximum possible score of 800 for math.
- Save your results in a pandas DataFrame called
best_math_schools, including"school_name"and"average_math"columns, sorted by"average_math"in descending order.
# Sort the highest math scores
best_math_schools = schools.sort_values("average_math", ascending=False)
best_math_schools = best_math_schools[best_math_schools["average_math"] >= 640]
best_math_schools = best_math_schools[["school_name", "average_math"]]
print(best_math_schools)What are the top 10 performing schools based on the combined SAT scores?
- Save your results as a pandas DataFrame called
top_10_schoolscontaining the"school_name"and a new column named"total_SAT", with results ordered by"total_SAT"in descending order ("total_SAT"being the sum of math, reading, and writing scores).
# Add total_SAT to schools DataFrame
schools["total_SAT"] = schools[["average_math", "average_reading", "average_writing"]].sum(axis=1)
# Filter SAT scores from descending order
SAT_srt = schools.sort_values("total_SAT", ascending=False)
# Create a DataFrame with top 10 schools
top_10_schools = SAT_srt[["school_name", "total_SAT"]].head(10)
print(top_10_schools)Which single borough has the largest standard deviation in the combined SAT score?
- Save your results as a pandas DataFrame called largest_std_dev.
- The DataFrame should contain one row, with:
"borough"- the name of the NYC borough with the largest standard deviation of "total_SAT"."num_schools"- the number of schools in the borough."average_SAT"- the mean of "total_SAT"."std_SAT"- the standard deviation of "total_SAT".
- Round all numeric values to two decimal places.
# Group by borough and calculate statistics
borough_stats = schools.groupby("borough").agg(
num_schools=("total_SAT", "size"),
average_SAT=("total_SAT", "mean"),
std_SAT=("total_SAT", "std")
)
# Find the index (borough) with the largest standard deviation
largest_std_dev_borough = borough_stats["std_SAT"].idxmax()
# Find the row with the largest standard deviation
largest_std_dev_row = borough_stats.loc[largest_std_dev_borough]
# Create a DataFrame and reset index to include borough as a column
largest_std_dev = pd.DataFrame([largest_std_dev_row]).round(2)
largest_std_dev["borough"] = largest_std_dev_borough
# Reset index and reorder columns
largest_std_dev = largest_std_dev.reset_index(drop=True)
largest_std_dev = largest_std_dev[["borough", "num_schools", "average_SAT", "std_SAT"]]
# Display the result
print(largest_std_dev)