Exploring Airbnb Market Trends

There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. As a consultant working for a real estate start-up, you have collected Airbnb listing data from various sources to investigate the short-term rental market in New York. You'll analyze this data to provide insights on private rooms to the real estate company.

Three files containing data on 2019 Airbnb listings are available to you:

data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.

listing_id: unique identifier of listing
price: nightly listing price in USD
nbhood_full: name of borough and neighborhood where listing is located

data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.

listing_id: unique identifier of listing
description: listing description
room_type: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.

listing_id: unique identifier of listing
host_name: name of listing host
last_review: date when the listing was last reviewed

import pandas as pd
import numpy as np

# load in all the data-sets as individual dfs (initially)
price_df = pd.read_csv('data/airbnb_price.csv')
roomtype_df = pd.read_excel('data/airbnb_room_type.xlsx')
lastreview_df = pd.read_csv('data/airbnb_last_review.tsv', sep='\t')

Though implied from the project instruction, I want to check that the 'listing_ID' is essentially an intersection.

# check the listing_id is an intersection between the three datasets
if all('listing_id' in df.columns for df in [price_df, roomtype_df, lastreview_df]):
    id_sets = [set(df['listing_id']) for df in [price_df, roomtype_df, lastreview_df]]
    shared_ids = set.intersection(*id_sets)
    if shared_ids:
        print(f"Checked. {len(shared_ids)} IDs are found in all three DataFrames.")
    else:
        print("No IDs are shared across all three DataFrames.")
else:
    print("One or more DataFrames are missing the 'listing_id' column.")

# merge the dfs together using listing_id as the intersection, we should expect 7 columns
wide_df = (
    price_df
    .merge(roomtype_df, on='listing_id', how='inner')
    .merge(lastreview_df, on='listing_id', how='inner')
)
wide_df.info()

wide_df.head()

What are the dates of the earliest and most recent reviews?

Store these values as two separate variables with your preferred names.

earliest_review = wide_df['last_review'].min()
earliest_review

mostrecent_review = wide_df['last_review'].max()
mostrecent_review

This is incorrect, looks like it's sorting the last_review column alphabetically which tells me I should of made sure the data-types were cleaned and prepped...

wide_df.head()

# prices are currently all string - '### dollars' - convert to int32
wide_df['price'] = wide_df['price'].astype(str)
wide_df['price'] = (
    wide_df['price']
    .str.replace(' dollars', '', regex=False)
    .astype('int32')
)

# convert last_review into datetime
wide_df['last_review'] = pd.to_datetime(wide_df['last_review'])

Initially when I checked this, I was confused as to why the str dtypes were still appearing as objects. This is where I learned about pandas more recent use adoption of the 'string' type, which appears to be more a preference in current workflows...

# convert all text/object columns into str
wide_df[['nbhood_full', 'description', 'room_type', 'host_name']] = wide_df[['nbhood_full', 'description', 'room_type', 'host_name']].astype('string')

# make room type values consistent casing
wide_df['room_type'] = wide_df['room_type'].str.lower()

# check my moves...
wide_df.info()

Let's try again with the earliest and most recent dates

‌
‌
‌

Exploring Airbnb Market Trends

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}What are the dates of the earliest and most recent reviews?

What are the dates of the earliest and most recent reviews?