Skip to content

Photo by Jannis Lucas on Unsplash.

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.

You have been provided with a dataset called schools.csv, which is previewed below.

You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.

# Re-run this cell 
import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")

# Preview the data
schools.head()
#Find best maths results
best_results = (80/100)*800

#Subset schools to get over_average 
over_average = schools[schools['average_math']>=best_results]
school_name_math_only = over_average[['school_name','average_math']]

#Sorting school_name_math_only by 'average_math' and storing in best_math_schools
best_math_schools = school_name_math_only.sort_values('average_math',ascending = False)

#Printing results
print(best_math_schools)
# Totalling the SAT scores in the schools df 
schools['total_SAT'] = schools['average_math'] + schools['average_reading'] + schools['average_writing']

# Sorting the df by total_SAT 
sorted_schools = schools.sort_values('total_SAT', ascending=False)

#Subsetting sorted_schools to find top 10
top_10 = sorted_schools[0:10]

#Storing the school_name and total_SAT of top_10 in top_10_schools
top_10_schools = top_10[['school_name','total_SAT']]

#Printing top_10_schools
print(top_10_schools)
# Grouping the DataFrame by borough and using agg functions count, mean, and std
grouped_schools = schools.groupby('borough').agg(num_schools = ('school_name', 'count'),
                                                 average_SAT= ('total_SAT', 'mean'),
                                                 std_SAT = ('total_SAT', 'std'))

#Rounding the values average_SAT and std_SAT to two decimal places
grouped_schools['average_SAT'] = grouped_schools['average_SAT'].round(2)
grouped_schools['std_SAT'] = grouped_schools['std_SAT'].round(2)

#Identifying the largest standard deviation in the df grouped_schools 
#Storing the row having largest std in largest_std_dev
largest_std_dev_series = grouped_schools.loc[grouped_schools['std_SAT'].idxmax()]

#Changing the series into a DataFrame borough = ('borough',)
largest_std_dev =(largest_std_dev_series.to_frame().T)
largest_std_dev = largest_std_dev.reset_index().rename(columns={'index': 'borough'})
print(largest_std_dev)