Skip to content
New Workbook
Sign up
Exploring Air Force Personnel Data with Python

Exploring Air Force personnel data

In this code-along, we'll explore a US Department of Defense personnel demographics dataset. The publicly available dataset was taken from data.gov, and has been cleaned and tidied, so you can get straight into exploratory data analysis.

The dataset contains counts of military personnel by gender, race, and paygrade. It was compiled in March 2010.

We'll do the first few tasks together, then you can try some tasks yourself.

You can consult the solution in the file browser, under notebook-solution.ipynb

1: Import the packages

Today, we'll be using pandas for data manipulation and calculations, and plotly.express for visualization.

  • Import the pandas package using the alias pd.
  • Import the plotly.express package using the alias px.
# Import the pandas package
import pandas as pd

# Import the plotly express package
import plotly.express as px

2: Read in the dataset

The demographics dataset is contained in a CSV file named "dod_demographics.csv".

  • Use pandas to read this CSV file. Assign it to a variable named dod_demographics.
# Import the demographic data from "dod_demographics.csv"
dod_demographics = pd .read_csv('dod_demographics.csv')

# See the result
dod_demographics

The dataset has 6 columns.

  • service: Army, Navy, Marine Corps, Air Force, Coast Guard. (Space Force didn't exist when the dataset was compiled.)
  • gender: MALE or FEMALE.
  • race: AMI/ALN, ASIAN, BLACK, MULTI, P/I, WHITE, UNK.
  • hispanicity: HISP, NON-HISP.
  • paygrade: Enlisted grades E00 to E09, Warrant Officer grades W01 to W05, Officer grades O01 to O10.
  • count: number of personnel in that demographic.

3: Get the subset with the Air Force dataset

The dataset contains data for all the services, but we only want to analyze the Air Force data.

  • Query dod_demographics for rows where the service is equal to "Air Force". Assign to air_force.
# Query dod_demographics for rows in the "Air Force" service
air_force = dod_demographics[dod_demographics['service'] == 'Air Force']

# See the results
air_force