Streamlined Data Ingestion with pandas
Run the hidden code cell below to import the packages used in this course.
# Create string of lettered columns to load
col_string = 'AD, AW:BA'
# Load data with skiprows and usecols set
survey_responses = pd.read_excel("fcc_survey_headers.xlsx",
usecols=col_string,
skiprows=2)
# View the names of the columns selected
print(survey_responses.columns# Create an empty dataframe
all_responses = pd.DataFrame()
# Set up for loop to iterate through values in responses
for df in responses.values():
# Print the number of rows being added
print("Adding {} rows".format(df.shape[0]))
# Append df to all_responses, assign result
all_responses = all_responses.append(df)
# Graph employment statuses in sample
counts = all_responses.groupby("EmploymentStatus").EmploymentStatus.count()
counts.plot.barh()
plt.show()# Set dtype to load appropriate column(s) as Boolean data
survey_data = pd.read_excel("fcc_survey_subset.xlsx",
dtype={'HasDebt':bool})
# View financial burdens by Boolean group
print(survey_data.groupby('HasDebt').sum())# Importing the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.io.json import json_normalize
from sqlalchemy import create_engineTake Notes
Add notes about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets hereExplore Datasets
Try using the prompt below to explore the data and practice your skills!
There are three data files in the datasets/ directory of varying kinds: data.db (NYC weather and 311 housing complaints), fcc-new-coder-survey.xlsx ( FreeCodeCamp New Developer Survey response subset), and vt_tax_data_2016.csv (Vermont tax return data by ZIP code).
Import each of these files into a format useful for data analysis in Python, such as specifying data types, handling bad or missing data, and parsing dates. You can also practice importing a specific portion of data from the data files (e.g., certain sheet(s) of an Excel worksheet or using SQL to filter a database on conditions).