The Housing Market Might Not be that Bad
As a newly-wed home seeker, I've had to stare down the barrel of our current housing market and try not to flinch.
It's not easy.
But it got me wondering just how bad things have gotten, so I dug into the home price, household income, and inflation data to see how the current market compares to the last 25 years.
What I found genuinely surprised me...
To start, we'll have to pull all of the average housing price by state from the Zillow Home Value Index, which downloads as a .csv file that we can save as an .xlsx file and manipulate with Ptyhon.
We need to modify the format of this data for two reasons:
- We want annual averages for housing data to match median household income data.
- We want all of the date and state information to fit into single columns, instead of all the dates having their own columns, as this increases our Tableau reporting capabilities.
To make these changes, we're going to leverage the Pandas library in Python:
I've uploaded the completed .xlsx file because DataLab won't allow me to modify attached files in this repository, but if you get the data from Zillow yourself, you can follow along locally on your own computer.
We'll start by importing the Pandas library and reading the excel file as a DataFrame:
import pandas as pd
housing_data = pd.read_excel('all_housing_data.xlsx', sheet_name='raw_housing_data')
print(housing_data)Now we're going to...
- Drop all of the unncessary columns from the Zillow export,
- Transpose the DataFrame so that the dates become the row indexes,
- Rename the columns to the state names, as the current column headers are the index numbers from before the transposition.
housing_data = housing_data.drop(['RegionID', 'SizeRank', 'RegionType', 'StateName'], axis=1)
housing_data = housing_data.transpose()
housing_data.columns = housing_data.iloc[0]
housing_data.drop('RegionName', inplace=True)
print(housing_data)Next, we'll create an empty DataFrame to collect all of the state date information in a single column, then loop through the rows in the existing DataFrame to collect housing data for each individual state, before concatenating it to the bottom of the DataFrame we just created:
all_state_data = pd.DataFrame()
for column in housing_data:
state = pd.DataFrame(housing_data[column])
state['State'] = column
state.columns = ('unadjusted_price','state')
state = state[['state', 'unadjusted_price']]
all_state_data = pd.concat([all_state_data,state])
print(all_state_data)At this point, I would save the DataFrame to a .csv that we'll be uploading to a SQL database in order to group all of the state data by year and getting the average home price for each state over that time.
If you're following along, you can use the Python code below to create that .csv:
# all_state_data.to_csv(all_housing_data.csv)Because we can't save files to this repository via Python, I've uploaded the .csv as a data source so we can manipulate it via SQL.
Something I want to note: I use a PostgreSQL database to store the table and Beekeeper Studio to query it, and despite all my efforts, I can't find a way to recreate it in DataLab, so I'm just going to post the SQL I used to modify the data below:
SELECT EXTRACT(YEAR FROM date) as year, state, avg(unadjusted_price) as unadjusted_price
FROM all_housing_data
GROUP BY state, year
ORDER BY state, year
COPY all_housing_data TO 'housing_data_grouped.csv' DELIMITER ',' CSV HEADER;Here's a screenshot of what that query looks like on my end:
You'll want to import the housing_data_grouped.csv file to the all_housing_data.xlsx Excel file we created from the initial Zillow export.
Next, we're going to grab median household income data from the Federal Reserve Bank of St. Louis (FRED) using the Requests library in Python to scrape the information from their website: