Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Import necessary packages
import pandas as pd
import numpy as np
import seaborn as sns
# Begin coding here ...
airbnb_review = pd.read_csv('data/airbnb_last_review.tsv', sep='\t')
airbnb_review['last_review_dt'] = pd.to_datetime(airbnb_review['last_review'])
review_sort_rec = airbnb_review.sort_values(by='last_review_dt', ascending=False)
review_sort_early = airbnb_review.sort_values(by='last_review_dt', ascending=True)
recent = airbnb_review['last_review_dt'].max()
earliest = airbnb_review['last_review_dt'].min()
airbnb_review.head()
#review_sort_rec
#review_sort_early
#airbnb_review.dtypes
#recent
#earliest
room = pd.read_excel('data/airbnb_room_type.xlsx')
room['room_type'] = room['room_type'].str.title()
room_type_nos = room['room_type'].value_counts()
private_nos = room_type_nos.iloc[1]
room.head()
#room['room_type'].unique()
#private.head()
#room_type_nos
#private_nos
airbnb_price = pd.read_csv('data/airbnb_price.csv')
airbnb_price['price'] = airbnb_price['price'].replace(' dollars', '', regex=True).astype(float)
avg_price = round(airbnb_price['price'].mean(), 2)
airbnb_price.head()
#airbnb_price.dtypes
#airbnb_price.sort_values(by='price', ascending=False)
#avg_price
review_dates = pd.DataFrame({
'first_reviewed': [earliest],
'last_reviewed': [recent],
'nb_private_rooms': private_nos,
'avg_price': avg_price
})
review_dates
airbnb_room_review = pd.merge(room, airbnb_review, on='listing_id')
airbnb = pd.merge(airbnb_room_review, airbnb_price, on='listing_id')
borough = airbnb['nbhood_full'].str.split(',', expand=True)
airbnb['borough'] = borough[0]
airbnb['nbhood'] = borough[1]
revmth = airbnb['last_review'].str.split(' ', expand=True)
airbnb['review_month_no'] = airbnb['last_review_dt'].dt.month
airbnb['review_month'] = revmth[0]
#airbnb_room_review.head()
airbnb.head()
# Room Type
room_type = airbnb['room_type'].value_counts()
borough_no = airbnb['borough'].value_counts()
nbhood = airbnb[['borough', 'nbhood']].value_counts().reset_index(name='no_of_listings')
rev = airbnb['review_month_no'].value_counts().reset_index(name='no_of_listings').sort_values(by='index')
rev.rename(columns={'index':'month_no'}, inplace=True)
month_map = {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: 'May', 6: 'June', 7: 'July', 8: 'August', 9: 'September', 10: 'October', 11: 'November', 12: 'December'}
rev['month'] = rev['month_no'].map(month_map)
room_borough = airbnb[['borough', 'room_type']].value_counts().reset_index(name='no_of_listings')
room_borough['%'] = round(((room_borough['no_of_listings'])/(room_borough['no_of_listings'].sum())) * 100, 2)
#room_type
#rev
#nbhood
#borough_no
room_borough
borough_price = round(airbnb.groupby('borough')['price'].mean(), 2)
nbhood_price = round(airbnb.groupby(['borough', 'nbhood'])['price'].mean(), 2).sort_values(ascending=False).reset_index()
room_price = round(airbnb.groupby('room_type')['price'].mean(),2)
nbhood_room = round(airbnb.groupby(['borough', 'room_type'])['price'].mean().sort_values(ascending=False), 2)
#borough_price
#nbhood_price
#nbhood_price.median()
#room_price
nbhood_room
import seaborn.objects as so
#sns.barplot(data=borough_no, palette='husl')
sns.lineplot(x='month', y='no_of_listings', data=rev)
sns.scatterplot(x='month', y='no_of_listings', data=rev)
# Create a stacked bar plot of room types in the boroughs
#so.Plot(room_borough, x='borough', y='%', color='room_type').add(so.Bar(), so.Stack())