Skip to content
Exploratory Data Analysis in Python
  • AI Chat
  • Code
  • Report
  • Exploratory Data Analysis in Python

    👋 Welcome to your workspace! Here, you can write and run Python code and add text in Markdown. Below, we've imported the datasets from the course Exploratory Data Analysis in Python as DataFrames as well as the packages used in the course. This is your sandbox environment: analyze the course datasets further, take notes, or experiment with code!

    # Importing course packages; you can add more too!
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import scipy.stats
    import scipy.interpolate
    import statsmodels.formula.api as smf
    
    # Importing course datasets as DataFrames
    brfss = pd.read_hdf('datasets/brfss.hdf5', 'brfss') # Behavioral Risk Factor Surveillance System (BRFSS) 
    gss = pd.read_hdf('datasets/gss.hdf5', 'gss') # General Social Survey (GSS) 
    nsfg = pd.read_hdf('datasets/nsfg.hdf5', 'nsfg') # National Survey of Family Growth (NSFG) 
    
    brfss.head() # Display the first five rows
    # Begin writing your own code here!

    Don't know where to start?

    Try completing these tasks:

    • Begin by calculating the number of rows and columns and displaying the names of columns for each DataFrame. Change any column names for better readability.
    • Experiment and compute a correlation matrix for variables in nsfg.
    • Compute the simple linear regression of WTKG3 (weight) and HTM4 (height) in brfss (or any other variables you are interested in!). Then, compute the line of best fit and plot it. If the fit doesn't look good, try a non-linear model.