Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this notebook, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
Our goals are to convert untidy data into appropriate formats to analyze, and answer key questions including:
- What is the average price, per night, of an Airbnb listing in NYC?
- How does the average price of an Airbnb listing, per month, compare to the private rental market?
- How many adverts are for private rooms?
- How do Airbnb listing prices compare across the five NYC boroughs?
# We've loaded your first package for you! You can add as many cells as you need.
import numpy as np
import pandas as pd
airbnb_analysis={}
#Import the data
prices = pd.read_csv('data/airbnb_price.csv')
types = pd.read_excel('data/airbnb_room_type.xlsx')
reviews = pd.read_csv('data/airbnb_last_review.tsv', sep='\t')
print(prices.head())
print(types.head())
print(reviews.head())
#Delete unnecessary tables
del types['description']
del reviews['host_name']
#Look For Duplicates
print(prices.isna().any())
print(types.isna().any())
print(reviews.isna().any())
prices['price'],y = prices['price'].str.split(' ').str
prices['price'] = prices['price'].astype(int)
import matplotlib.pyplot as plt
prices.boxplot(column='price')
plt.yscale('log')
plt.show()
# Subset prices for listings costing $0, free_listings
free_listings = prices["price"] == 0
# Update prices by removing all free listings from prices
prices = prices.loc[~free_listings]
airbnb_analysis['avg_price'] = prices['price'].mean().round(2)
print(airbnb_analysis)
prices['price_per_month'] = prices['price'] * 365 /12
airbnb_analysis['average_price_per_month'] = prices['price_per_month'].mean().round(2)
airbnb_analysis['difference'] = (airbnb_analysis['average_price_per_month'] - 3100).round(2)
print(airbnb_analysis)
types['room_type'] = types['room_type'].str.lower().astype('category')
print(types['room_type'].unique())
airbnb_analysis['room_frequency'] = types["room_type"].value_counts()
print(airbnb_analysis)
reviews['last_review']=pd.to_datetime(reviews['last_review'])
airbnb_analysis['first_reviewed'] = reviews['last_review'].min()
airbnb_analysis['last_reviewed'] = reviews['last_review'].max()
print(airbnb_analysis)
rooms_and_prices = prices.merge(types, on='listing_id', how='outer')
airbnb_merged = rooms_and_prices.merge(reviews, on='listing_id', how='outer')
airbnb_merged.dropna(inplace=True)
print(airbnb_merged.duplicated().sum())
airbnb_merged['borough'] = airbnb_merged["nbhood_full"].str.partition(', ')[0]
boroughs = airbnb_merged.groupby('borough')['price'].agg([np.mean, np.median, sum, 'count']).round(2).sort_values('mean')
print(boroughs)
label_names = ["Budget", "Average", "Expensive", "Extravagant"]
ranges = [0, 69, 175, 350, np.inf]
airbnb_merged['price_range'] = pd.cut(airbnb_merged['price'], bins=ranges, labels=label_names)
# Group by borough and price_range
prices_by_borough = airbnb_merged.groupby(["borough", "price_range"])["price_range"].count()
airbnb_analysis['prices_by_borough'] = prices_by_borough
print(airbnb_analysis)