Skip to content
Exploratory Data Analysis in Python
  • AI Chat
  • Code
  • Report
  • Exploratory Data Analysis in Python

    Run the hidden code cell below to import the data used in this course.~

    # Importing the course packages
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import scipy.stats
    import scipy.interpolate
    import statsmodels.formula.api as smf
    
    # Importing the course datasets
    brfss = pd.read_hdf('datasets/brfss.hdf5', 'brfss') # Behavioral Risk Factor Surveillance System (BRFSS) 
    gss = pd.read_hdf('datasets/gss.hdf5', 'gss') # General Social Survey (GSS) 
    nsfg = pd.read_hdf('datasets/nsfg.hdf5', 'nsfg') # National Survey of Family Growth (NSFG)

    Take Notes

    Add notes about the concepts you've learned and code cells with code you want to keep.

    Add your notes here

    # Add your code snippets here

    Explore Datasets

    Use the DataFrames imported in the first cell to explore the data and practice your skills!

    • Begin by calculating the number of rows and columns and displaying the names of columns for each DataFrame. Change any column names for better readability.
    • Experiment and compute a correlation matrix for variables in nsfg.
    • Compute the simple linear regression of WTKG3 (weight) and HTM4 (height) in brfss (or any other variables you are interested in!). Then, compute the line of best fit and plot it. If the fit doesn't look good, try a non-linear model.