Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Import necessary packages
import pandas as pd
import numpy as np
# Begin coding here ...
# Use as many cells as you like
abb_price = pd.read_csv("data/airbnb_price.csv")
abb_price.head()
abb_price.info()
abb_rooms = pd.read_excel("data/airbnb_room_type.xlsx")
abb_rooms.head()
abb_rooms.info()
abb_reviews = pd.read_csv("data/airbnb_last_review.tsv", sep = '\t')
abb_reviews.head()
abb_reviews.info()
#add new column=last_review_date in date format
abb_reviews_date = pd.to_datetime(abb_reviews['last_review'], infer_datetime_format=True, errors='coerce')
abb_reviews["last_review_date"] = abb_reviews_date
abb_reviews.head()
#what are the earliest and most recent reviews?
recent_rev_date = abb_reviews_date.sort_values().max()
earliest_rev_date = abb_reviews_date.sort_values().min()
print(recent_rev_date, earliest_rev_date)
#How many of the listing are private rooms?
#cleaning up "room_type"
abb_rooms["room_type"] = abb_rooms["room_type"].str.lower()
privat = abb_rooms[abb_rooms["room_type"]=='private room']["room_type"].value_counts()
print(privat)
#What ist the avg listing price
# Clening up the price column
abb_price["price_dollars"] = abb_price["price"].str.strip(' dollars')
abb_price["price_dollars"] = abb_price["price_dollars"].astype('float')
avg_price = np.round(abb_price["price_dollars"].mean(), decimals=2)
print(avg_price , "$")
#Combine new vars into one DataFrame
dates_dict = { 'first_reviewed' : earliest_rev_date ,
'last_reviewed' : recent_rev_date,
'nb_private_rooms' : privat,
'avg_price' : avg_price
}
review_dates = pd.DataFrame.from_dict(dates_dict)
assert type(review_dates) == pd.DataFrame
print(review_dates)