Skip to content
Exploring Air Force personnel data
In this code-along, we'll explore a US Department of Defense personnel demographics dataset. The publicly available dataset was taken from data.gov, and has been cleaned and tidied, so you can get straight into exploratory data analysis.
The dataset contains counts of military personnel by gender, race, and paygrade. It was compiled in March 2010.
We'll do the first few tasks together, then you can try some tasks yourself.
1: Import the packages
Today, we'll be using pandas
for data manipulation and calculations, and plotly.express
for visualization.
- Import the
pandas
package using the aliaspd
. - Import the
plotly.express
package using the aliaspx
.
# Import the pandas package
import pandas as pd
# Import the plotly express package
import plotly.express as px
2: Read in the dataset
The demographics dataset is contained in a CSV file named "dod_demographics.csv"
.
- Use
pandas
to read this CSV file. Assign it to a variable nameddod_demographics
.
# Import the demographic data from "dod_demographics.csv"
dod_demographics = pd.read_csv("dod_demographics.csv")
# See the result
dod_demographics
The dataset has 6 columns.
- service: Army, Navy, Marine Corps, Air Force, Coast Guard. (Space Force didn't exist when the dataset was compiled.)
- gender: MALE or FEMALE.
- race: AMI/ALN, ASIAN, BLACK, MULTI, P/I, WHITE, UNK.
- hispanicity: HISP, NON-HISP.
- paygrade: Enlisted grades E00 to E09, Warrant Officer grades W01 to W05, Officer grades O01 to O10.
- count: number of personnel in that demographic.
3: Get the subset with the Air Force dataset
The dataset contains data for all the services, but we only want to analyze the Air Force data.
- Query
dod_demographics
for rows where theservice
is equal to"Air Force"
. Assign toair_force
.
# Query dod_demographics for rows in the "Air Force" service
air_force = dod_demographics.query('service == "Air Force"')
# See the results
air_force