Skip to content

Photo by Jannis Lucas on Unsplash.

Every year, American high school students take SATs, which are standardized tests intended to measure literacy, numeracy, and writing skills. There are three sections - reading, math, and writing, each with a maximum score of 800 points. These tests are extremely important for students and colleges, as they play a pivotal role in the admissions process.

Analyzing the performance of schools is important for a variety of stakeholders, including policy and education professionals, researchers, government, and even parents considering which school their children should attend.

You have been provided with a dataset called schools.csv, which is previewed below.

You have been tasked with answering three key questions about New York City (NYC) public school SAT performance.

# Re-run this cell 
import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")

# Preview the data
schools.head()

# Start coding here...
# Add as many cells as you like...

Identificar las escuelas con los mejores resultados en matemáticas

import pandas as pd

# Read in the data
schools = pd.read_csv("schools.csv")

# Filtrar las escuelas con puntajes de matemáticas >= 640
best_math_schools = schools[schools['average_math'] >= 640][['school_name', 'average_math']]

# Ordenar los resultados por average_math en orden descendente
best_math_schools = best_math_schools.sort_values(by='average_math', ascending=False).reset_index(drop=True)

# Mostrar los mejores resultados en matemáticas
print(best_math_schools)

Identificar las 10 mejores escuelas basadas en los puntajes combinados del SAT

# Calcular el puntaje total del SAT
schools['total_SAT'] = schools['average_math'] + schools['average_reading'] + schools['average_writing']

# Ordenar los resultados por total_SAT en orden descendente y seleccionar las 10 mejores escuelas
top_10_schools = schools[['school_name', 'total_SAT']].sort_values(by='total_SAT', ascending=False).head(10).reset_index(drop=True)

# Mostrar las 10 mejores escuelas
print(top_10_schools)

Encontrar el distrito con la mayor desviación estándar en el puntaje combinado del SAT

# Agrupar por distrito y calcular la desviación estándar y media del puntaje total del SAT
borough_stats = schools.groupby('borough').agg(
    num_schools=('school_name', 'count'),
    average_SAT=('total_SAT', 'mean'),
    std_SAT=('total_SAT', 'std')
).reset_index()

# Redondear los valores numéricos a dos decimales
borough_stats['average_SAT'] = borough_stats['average_SAT'].round(2)
borough_stats['std_SAT'] = borough_stats['std_SAT'].round(2)

# Encontrar el distrito con la mayor desviación estándar
largest_std_dev = borough_stats.loc[borough_stats['std_SAT'].idxmax()].to_frame().T

# Mostrar el distrito con la mayor desviación estándar
print(largest_std_dev)