Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Import necessary packages
import pandas as pd
import numpy as np
# Begin coding here ...
# Use as many cells as you like
csv = 'airbnb_price.csv'
excel = 'airbnb_room_type.xlsx'
tsv = 'airbnb_last_review.tsv'
import os
current_dir = os.getcwd()
os.listdir(current_dir)
print(current_dir)
file_path = os.path.join(current_dir, 'data', csv)
airbnb_price = pd.read_csv(file_path)
file_path2 = os.path.join(current_dir, 'data', excel)
airbnb_room_type = pd.ExcelFile(file_path2)
#print(airbnb_room_type.sheet_names)
room_type = airbnb_room_type.parse('airbnb_room_type')
#print(room_type.head())
file_path3 = os.path.join(current_dir, 'data', tsv)
airbnb_review = pd.read_csv(file_path3, sep='\t')
#print(airbnb_review['last_review'].dtype)
airbnb_review['last_review'] = pd.to_datetime(airbnb_review['last_review'])
#print(airbnb_review['last_review'].dtype)
airbnb_review.sort_values('last_review', inplace=True)
head_earliest = airbnb_review.head(1)
tail_latest = airbnb_review.tail(1)
#print(head_earliest)
#print(tail_latest)
earliest_date = head_earliest['last_review'].item()
latest_date = tail_latest['last_review'].item()
#print(room_type.head())
#print(room_type['room_type'].unique())
#print(room_type['room_type'].value_counts())
room_type['room_type'] = room_type['room_type'].str.lower()
room_types = room_type['room_type'].value_counts()
listing_per_room = dict(room_types)
private_room_listings = listing_per_room['private room']
#print(airbnb_price.head())
airbnb_price['price'] = airbnb_price['price'].str.strip(' dollars')
#print(airbnb_price.head())
airbnb_price['price'] = airbnb_price['price'].astype('int')
ave = airbnb_price['price'].mean()
ave_price = ave.round(2)
review_dates_dict = {'first_reviewed':earliest_date, 'last_reviewed':latest_date, 'nb_private_rooms':private_room_listings, 'avg_price':ave_price}
print(review_dates_dict)
review_dates = pd.DataFrame([review_dates_dict])
print(review_dates)