Skip to content

In this code-along, we'll explore a US Department of Defense personnel demographics dataset. The publicly available dataset was taken from data.gov, and has been cleaned and tidied, so you can get straight into exploratory data analysis.

The dataset contains counts of military personnel by gender, race, and paygrade. It was compiled in March 2010.

We'll do the first few tasks together, then you can try some tasks yourself.

Exploring Air Force personnel data

Today, we'll be using pandas for data manipulation and calculations, and plotly.express for visualization.

1: Import the packages

  • Import the pandas package using the alias pd.
  • Import the plotly.express package using the alias px.
# Import the pandas package
import pandas as pd

# Import the plotly express package
import plotly.express as px

2: Read in the dataset

The demographics dataset is contained in a CSV file named "dod_demographics.csv".

  • Use pandas to read this CSV file. Assign it to a variable named dod_demographics.

The dataset has 6 columns.

  • service: Army, Navy, Marine Corps, Air Force, Coast Guard. (Space Force didn't exist when the dataset was compiled.)
  • gender: MALE or FEMALE.
  • race: AMI/ALN, ASIAN, BLACK, MULTI, P/I, WHITE, UNK.
  • hispanicity: HISP, NON-HISP.
  • paygrade: Enlisted grades E00 to E09, Warrant Officer grades W01 to W05, Officer grades O01 to O10.
  • count: number of personnel in that demographic.

3: Get the subset with the Air Force dataset

The dataset contains data for all the services, but we only want to analyze the Air Force data.

  • Query dod_demographics for rows where the service is equal to "Air Force". Assign to air_force.
# Query dod_demographics for rows in the "Air Force" service
air_force = dod_demographics.query('service == "Air Force"')

# See the results
air_force
Hidden output

4: Start exploring! How much data do we have?