Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, you will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
(Excel files).
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Load necessary libraries
library(readr)
library(readxl)
library(dplyr)
library(stringr)
# Read the CSV, Excel, and TSV files
airbnb_price <- read_csv('data/airbnb_price.csv')
airbnb_room_type <- read_excel('data/airbnb_room_type.xlsx')
airbnb_last_review <- read_tsv('data/airbnb_last_review.tsv')
# Merge the three data frames
airbnb <- inner_join(airbnb_price, airbnb_room_type, by = "listing_id") %>%
inner_join(airbnb_last_review, by = "listing_id")
# Convert last_review to Date format
airbnb <- airbnb %>%
mutate(last_review = as.Date(last_review, format = "%B %d %Y"))
# Earliest and most recent reviews
earliest_review <- min(airbnb$last_review, na.rm = TRUE)
most_recent_review <- max(airbnb$last_review, na.rm = TRUE)
# Count of private rooms
nb_private_rooms <- airbnb %>%
filter(str_to_lower(room_type) == "private room") %>%
nrow()
# Average listing price
average_listing_price <- mean(as.numeric(str_remove(airbnb$price, " dollars")), na.rm = TRUE)
average_listing_price <- round(average_listing_price, 2)
# Create review_dates tibble
review_dates <- tibble(
first_reviewed = earliest_review,
last_reviewed = most_recent_review,
nb_private_rooms = nb_private_rooms,
avg_price = average_listing_price
)
review_dates