Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this notebook, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
(Excel files).
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
datasets/airbnb_price.csv
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
datasets/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
datasets/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# We've loaded your first few packages for you in the first cell. Please feel free to add as many cells as you like!
suppressMessages(library(dplyr)) # This line is required to check your answer correctly
options(readr.show_types = FALSE) # This line is required to check your answer correctly
library(readr)
library(readxl)
library(stringr)
library(tidyverse)
library(dplyr)
price_listing <- read_csv('data/airbnb_price.csv')
price_listing_clean <- price_listing %>% mutate(price_clean = as.numeric(gsub(' dollars', '', price))) %>% select(-price)
room <- read_excel('data/airbnb_room_type.xlsx') %>%
mutate(room_types = str_to_title(room_type)) %>% select(-room_type)
review <- read_tsv('data/airbnb_last_review.tsv')
review_clean <- review %>% mutate(last = as.Date(last_review, format = "%B %d %Y"))
date_1 <- min(review_clean$last)
date_2 <- max(review_clean$last)
rom <- as.numeric(sum(room$room_types == 'Private Room', na.rm = TRUE))
avg <- mean(price_listing_clean$price_clean)
review_dates <- tibble(first_reviewed = date_1, last_reviewed =date_2,
nb_private_rooms =rom, avg_price = avg)
review_dates
review_dates
joint_tab <- price_listing_clean %>% inner_join(room, by = 'listing_id') %>%
inner_join(review_clean, by = 'listing_id')
review_date <- joint_tab %>% summarise(first_reviewed = min(last), last_reviewed = max(last),nb_private_rooms = as.numeric(sum(room_types == 'Private Room', na.rm = TRUE)), avg_price = mean(price_clean))
review_date