Photo by Jannis Lucas on Unsplash.
Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.
Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.
You have been provided with a dataset called schools.csv, which is previewed below.
You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.
# Re-run this cell
import pandas as pd
# Read in the data
schools = pd.read_csv("schools.csv")
df=pd.read_csv("schools.csv")
# Preview the data
df.head()
print(df)
df.shape
df.info()
df.dtypes
df["school_name"]=df["school_name"].astype("category")
df["school_name"]=df["school_name"].astype("object")
df.nunique()
df['borough'].unique()
obj_column=df.select_dtypes(include="object")
obj_column
df.isnull()
df.isnull().sum()
column_name=df.columns
num_null=df.isnull().sum()
for i in column_name:
print(i)
for j in num_null:
print(j)
df.describe()
result=df[df['average_math']>=640]
filtered_data=result[result['average_math']>=640]
best_math_schools=filtered_data.sort_values(by='average_math',ascending=False)[["school_name","average_math"]]
best_math_schools.reset_index(drop=True,inplace=True)
best_math_schools.reset_index()
df['total_SAT']=df['average_math']+df['average_reading']+df['average_writing']
top_10_schools=df.sort_values(by='total_SAT',ascending=False)[["school_name","total_SAT"]].head(10)
top_10_schools.reset_index(drop=True,inplace=True)
top_10_schools.reset_index()
borough_sum = df.groupby('borough')['total_SAT'].agg(['count','mean','std']).round(2)
largest_std_dev=borough_sum[borough_sum['std']==borough_sum['std'].max()]
largest_std_dev.columns=['num_schools','average_SAT','std_SAT']
largest_std_dev.reset_index(inplace=True)
largest_std_dev.reset_index()
# Start coding here...
# Add as many cells as you like...