Exploring Air Force personnel data
In this code-along, we'll explore a US Department of Defense personnel demographics dataset. The publicly available dataset was taken from data.gov, and has been cleaned and tidied, so you can get straight into exploratory data analysis.
The dataset contains counts of military personnel by gender, race, and paygrade. It was compiled in March 2010.
We'll do the first few tasks together, then you can try some tasks yourself.
You can consult the solution in the file browser, under notebook-solution.ipynb
1: Import the packages
Today, we'll be using pandas for data manipulation and calculations, and plotly.express for visualization.
- Import the
pandaspackage using the aliaspd. - Import the
plotly.expresspackage using the aliaspx.
# Import the pandas package
import pandas as pd
# Import the plotly express package
import plotly.express as px2: Read in the dataset
The demographics dataset is contained in a CSV file named "dod_demographics.csv".
- Use
pandasto read this CSV file. Assign it to a variable nameddod_demographics.
# Import the demographic data from "dod_demographics.csv"
dod_demographics = pd.read_csv("dod_demographics.csv")
# See the result
dod_demographicsThe dataset has 6 columns.
- service: Army, Navy, Marine Corps, Air Force, Coast Guard. (Space Force didn't exist when the dataset was compiled.)
- gender: MALE or FEMALE.
- race: AMI/ALN, ASIAN, BLACK, MULTI, P/I, WHITE, UNK.
- hispanicity: HISP, NON-HISP.
- paygrade: Enlisted grades E00 to E09, Warrant Officer grades W01 to W05, Officer grades O01 to O10.
- count: number of personnel in that demographic.
3: Get the subset with the Air Force dataset
The dataset contains data for all the services, but we only want to analyze the Air Force data.
- Query
dod_demographicsfor rows where theserviceis equal to"Air Force". Assign toair_force.
# Query dod_demographics for rows in the "Air Force" service
air_force = dod_demographics.query('service == "Air Force"')
# See the results
air_force