Skip to content

Streamlined Data Ingestion with pandas

Run the hidden code cell below to import the packages used in this course.

# Create string of lettered columns to load
col_string = 'AD, AW:BA'

# Load data with skiprows and usecols set
survey_responses = pd.read_excel("fcc_survey_headers.xlsx", 
                        usecols=col_string, 
                        skiprows=2)

# View the names of the columns selected
print(survey_responses.columns
# Create an empty dataframe
all_responses = pd.DataFrame()

# Set up for loop to iterate through values in responses
for df in responses.values():
  # Print the number of rows being added
  print("Adding {} rows".format(df.shape[0]))
  # Append df to all_responses, assign result
  all_responses = all_responses.append(df)

# Graph employment statuses in sample
counts = all_responses.groupby("EmploymentStatus").EmploymentStatus.count()
counts.plot.barh()
plt.show()
# Set dtype to load appropriate column(s) as Boolean data
survey_data = pd.read_excel("fcc_survey_subset.xlsx",
                            dtype={'HasDebt':bool})

# View financial burdens by Boolean group
print(survey_data.groupby('HasDebt').sum())
# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.io.json import json_normalize
from sqlalchemy import create_engine

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here

Explore Datasets

Try using the prompt below to explore the data and practice your skills!

There are three data files in the datasets/ directory of varying kinds: data.db (NYC weather and 311 housing complaints), fcc-new-coder-survey.xlsx ( FreeCodeCamp New Developer Survey response subset), and vt_tax_data_2016.csv (Vermont tax return data by ZIP code).

Import each of these files into a format useful for data analysis in Python, such as specifying data types, handling bad or missing data, and parsing dates. You can also practice importing a specific portion of data from the data files (e.g., certain sheet(s) of an Excel worksheet or using SQL to filter a database on conditions).