Photo by Jannis Lucas on Unsplash.
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
# Re-run this cell
import pandas as pd
# Read in the data
schools = pd.read_csv("schools.csv")
# Preview the data
schools.head()
# Start coding here...
# Add as many cells as you like...#quick info
schools.info()#finding school with best math scores
#threshold for schools to be considered best in math(80% of 800)
math_threshold = 0.8 * 800
#filter out the school_name and average math marks of best math school
best_math_schools=schools[schools["average_math"]>=math_threshold][["school_name","average_math"]]
#sort schools with high math marks
best_math_schools=best_math_schools.sort_values("average_math",ascending=False)
#print schools
print(best_math_schools)
#visualizing the schools best in maths
import matplotlib.pyplot as plt
plt.bar(best_math_schools["school_name"],best_math_schools["average_math"], color="skyblue")
plt.xlabel("School Name")
plt.ylabel("Average Math Score")
plt.title("Schools with Best Math Results (80% or Higher)")
plt.xticks(rotation=45)
plt.tight_layout() # Adjust layout to prevent overlap
# 10 best performing schools based on the total score across the three SAT sections
#creating total sat with total SAT marks
schools["total_SAT"] = schools["average_math"] + schools["average_reading"] + schools["average_writing"]
#filtering and sorting top 10 schools
top_10_schools = schools.sort_values("total_SAT", ascending=False).head(10)
top_10_schools = top_10_schools[["school_name", "total_SAT"]]
print(top_10_schools)plt.figure(figsize=(10, 6))
plt.bar(top_10_schools["school_name"], top_10_schools["total_SAT"], color="skyblue")
plt.xlabel("School Name")
plt.ylabel("Total SAT Score")
plt.title("Top 10 Performing Schools Based on Total SAT Scores")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()#Which single borough has the largest standard deviation in the combined SAT score?
group_schools=schools.groupby("borough")
borough_stats=group_schools["total_SAT"].agg(num_schools="count",average_SAT="mean",std_SAT="std" ).round(2).reset_index()
# Find the borough with the largest standard deviation
largest_std_dev = borough_stats[borough_stats["std_SAT"] == borough_stats["std_SAT"].max()]
print(largest_std_dev)import matplotlib.pyplot as plt
# Bar plot to visualize the standard deviation of SAT scores across boroughs
plt.figure(figsize=(10, 6))
plt.bar(borough_stats["borough"], borough_stats["std_SAT"], color="skyblue")
plt.xlabel("Borough")
plt.ylabel("Standard Deviation of Total SAT Score")
plt.title("Standard Deviation of Total SAT Scores by Borough")
plt.xticks(rotation=45, ha="right")
plt.tight_layout()
plt.show()