Photo by Jannis Lucas on Unsplash.
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
# Re-run this cell
import pandas as pd
# Read in the data
schools = pd.read_csv("schools.csv")
# Preview the data
schools.head()
# Start coding here...
# Add as many cells as you like...Which NYC schools have the best math results?
##The best math results are at least 80% of the *maximum possible score of 800* for math.
##Save your results in a pandas DataFrame called best_math_schools, including "school_name" and "average_math" columns, sorted by "average_math" in descending order.
best_math_schools_pre=schools[schools["average_math"]>=0.8*800].sort_values("average_math", ascending=False)
best_math_schools=best_math_schools_pre[["school_name","average_math"]]
print(best_math_schools)What are the top 10 performing schools based on the combined SAT scores?
##Save your results as a pandas DataFrame called top_10_schools containing the "school_name" and a new column named "total_SAT", with results ordered by "total_SAT" in descending order ("total_SAT" being the sum of math, reading, and writing scores).
totSAT=schools[["average_math","average_reading","average_writing"]]
schools["total_SAT"]=totSAT.sum(axis="columns")
top_10_schools=schools[["school_name","total_SAT"]].sort_values("total_SAT",ascending=False).head(10)
print(top_10_schools)Which single borough has the largest standard deviation in the combined SAT score?
###Save your results as a pandas DataFrame called largest_std_dev.
#The DataFrame should contain one row, with:
#borough" - the name of the NYC borough with the largest standard deviation of "total_SAT".
#num_schools" - the number of schools in the borough.
#"average_SAT" - the mean of "total_SAT".
#"std_SAT" - the standard deviation of "total_SAT".
#Round all numeric values to two decimal places.
std_dev_borough=schools.groupby("borough")["total_SAT"].std().round(2)
schools["std_dev_borough"]=schools["borough"].map(std_dev_borough)
std_SAT=schools["std_dev_borough"].max()
max_std=schools[schools["std_dev_borough"]==std_SAT]
borough=max_std.loc[0,"borough"]
num_schools=max_std["school_name"].count()
average_SAT=max_std["total_SAT"].mean().round(2)
largest_std_dev_dict={"borough":[borough],"num_schools":[num_schools],"average_SAT":[average_SAT],"std_SAT":[std_SAT]}
largest_std_dev=pd.DataFrame(largest_std_dev_dict)
print(largest_std_dev)ChatBots solution for the last task
# Calculate the required statistics by borough
# Group by 'borough' and aggregate the necessary values
borough_stats = schools.groupby("borough").agg(
num_schools=('school_name', 'count'), # Number of schools
average_SAT=('total_SAT', 'mean'), # Mean of total SAT
std_SAT=('total_SAT', 'std') # Standard deviation of total SAT
).round(2) # Round to two decimal places
# Step 3: Identify the borough with the largest standard deviation
# Sort by std_SAT in descending order and select the first row (largest std)
largest_std_dev_row = borough_stats.sort_values(by="std_SAT", ascending=False).head(1)
# Step 4: Create the output DataFrame 'largest_std_dev' with required columns
largest_std_dev = pd.DataFrame({
'borough': largest_std_dev_row.index, # Borough name
'num_schools': largest_std_dev_row['num_schools'].values, # Number of schools
'average_SAT': largest_std_dev_row['average_SAT'].values, # Average SAT
'std_SAT': largest_std_dev_row['std_SAT'].values # Standard deviation of SAT
})
# Display the result
print(largest_std_dev)