Measles
This data contains the overall and measles, mumps, and rubella immunization rates for schools across the United States. Each row corresponds to one school and includes a number of variables including the latitude, longitude, name, and vaccination rates.
Not sure where to begin? Scroll to the bottom to find challenges!
import pandas as pd
pd.read_csv("data/measles.csv")Data Dictionary
| Column | Explanation |
|---|---|
| index | Index ID |
| state | School's state |
| year | School academic year |
| name | School name |
| type | Whether a school is public, private, charter |
| city | City |
| county | County |
| district | School district |
| enroll | Enrollment |
| mmr | School's Measles, Mumps, and Rubella (MMR) vaccination rate |
| overall | School's overall vaccination rate |
| xrel | Percentage of students exempted from vaccination for religious reasons |
| xmed | Percentage of students exempted from vaccination for medical reasons |
| xper | Percentage of students exempted from vaccination for personal reasons |
Don't know where to start?
Challenges are brief tasks designed to help you practice specific skills:
- 🗺️ Explore: What types of schools have the highest overall and mmr vaccination rates?
- 📊 Visualize: Create a plot that visualizes the overall and mmr vaccination rates for the ten states with the highest number of schools.
- 🔎 Analyze: Does location affect the vaccination percentage of a school?
Scenarios are broader questions to help you develop an end-to-end project for your portfolio:
You are working for a public health organization. The organization has a problem: this year, the overall vaccination rate information for schools is not yet available. To gain an initial idea of the rates, your manager has asked you whether it is possible to use other data to predict the overall vaccination rate of a school. This includes such information as the mmr vaccination rate, the location, and the type of school. Your manager also wants to know how reliable your predictions are.
You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.
✍️ If you have an idea for an interesting Scenario or Challenge, or have feedback on our existing ones, let us know! You can submit feedback by pressing the question mark in the top right corner of the screen and selecting "Give Feedback". Include the phrase "Content Feedback" to help us flag it in our system.
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the dataframe
df = pd.read_csv('data/measles.csv')
# Check the first 5 rows of the dataframe
df.head()
# Check the shape of the dataframe
df.shape
# Check the data types of the columns
df.dtypes
# Check for missing values
df.isnull().sum()
# Check the summary statistics of the numerical columns
df.describe()
# Visualize the distribution of the numerical columns
sns.histplot(df['overall'])
sns.histplot(df['mmr'])
sns.histplot(df['enroll'])
# Visualize the correlation between the numerical columns
sns.pairplot(df[['overall', 'mmr', 'enroll']])
# Visualize the relationship between overall vaccination rate and location
sns.boxplot(x='city', y='overall', data=df)
# Visualize the relationship between overall vaccination rate and school type
sns.boxplot(x='type', y='overall', data=df)