Skip to content
workspace-codealong-afors
  • AI Chat
  • Code
  • Report
  • Exploring Air Force personnel data

    In this code-along, we'll explore a US Department of Defense personnel demographics dataset. The publicly available dataset was taken from data.gov, and has been cleaned and tidied, so you can get straight into exploratory data analysis.

    The dataset contains counts of military personnel by gender, race, and paygrade. It was compiled in March 2010.

    We'll do the first few tasks together, then you can try some tasks yourself.

    1: Import the packages

    Today, we'll be using pandas for data manipulation and calculations, and plotly.express for visualization.

    • Import the pandas package using the alias pd.
    • Import the plotly.express package using the alias px.
    # Import the pandas package
    import pandas as pd
    
    # Import the plotly express package
    import plotly.express as px

    2: Read in the dataset

    The demographics dataset is contained in a CSV file named "dod_demographics.csv".

    • Use pandas to read this CSV file. Assign it to a variable named dod_demographics.
    # Import the demographic data from "dod_demographics.csv"
    dod_demographics = pd.read_csv("dod_demographics.csv")
    
    # See the result
    dod_demographics

    The dataset has 6 columns.

    • service: Army, Navy, Marine Corps, Air Force, Coast Guard. (Space Force didn't exist when the dataset was compiled.)
    • gender: MALE or FEMALE.
    • race: AMI/ALN, ASIAN, BLACK, MULTI, P/I, WHITE, UNK.
    • hispanicity: HISP, NON-HISP.
    • paygrade: Enlisted grades E00 to E09, Warrant Officer grades W01 to W05, Officer grades O01 to O10.
    • count: number of personnel in that demographic.

    3: Get the subset with the Air Force dataset

    The dataset contains data for all the services, but we only want to analyze the Air Force data.

    • Query dod_demographics for rows where the service is equal to "Air Force". Assign to air_force.
    # Query dod_demographics for rows in the "Air Force" service
    air_force = dod_demographics.query('service == "Air Force"')
    
    # See the results
    air_force