Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this notebook, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
(Excel files).
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
datasets/airbnb_price.csv
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
datasets/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
datasets/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# We've loaded your first few packages for you in the first cell. Please feel free to add as many cells as you like!
suppressMessages(library(dplyr)) # This line is required to check your answer correctly
options(readr.show_types = FALSE) # This line is required to check your answer correctly
library(readr)
library(readxl)
library(stringr)
library(lubridate)
# Import data
price <- read_csv('data/airbnb_price.csv')
review <- read_tsv('data/airbnb_last_review.tsv')
room <- read_xlsx('data/airbnb_room_type.xlsx')
# Create a single table with all data
bnb_joined <- price %>%
left_join(room, by = 'listing_id') %>%
left_join(review, by = 'listing_id')
# Fix data type issue with price
bnb_joined[c('price', 'currency')] <- str_split_fixed(bnb_joined$price, ' ', 2)
bnb_joined <- transform(bnb_joined, price = as.numeric(price))
# Fix data type issue with last_review
bnb_joined$last_review <- mdy(bnb_joined$last_review)
# Fix data type issue in room_type
bnb_joined$room_type <- str_to_lower(bnb_joined$room_type)
private <- bnb_joined %>%
group_by(room_type) %>%
summarize(nb_private_rooms = n()) %>%
filter(room_type == 'private room') %>%
select(nb_private_rooms)
review_dates <- bnb_joined %>%
summarize(avg_price = mean(price), first_reviewed = min(last_review), last_reviewed = max(last_review))
review_dates <- cbind(review_dates, private)
review_dates <- review_dates %>%
select(first_reviewed, last_reviewed, nb_private_rooms, avg_price)
review_dates