Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, you will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
(Excel files).
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# We've loaded the necessary packages for you in the first cell. Please feel free to add as many cells as you like!
suppressMessages(library(dplyr)) # This line is required to check your answer correctly
options(readr.show_types = FALSE) # This line is required to check your answer correctly
library(readr)
library(readxl)
library(stringr)
library(lubridate, verbose = FALSE)
library(tidyr, verbose = FALSE)
# Importing the datasets
airbnb_price = read_csv("data/airbnb_price.csv", show_col_types = FALSE)
airbnb_room_type = read_excel("data/airbnb_room_type.xlsx")
airbnb_last_review = read_tsv("data/airbnb_last_review.tsv", show_col_types = FALSE)
glimpse(airbnb_price)
glimpse(airbnb_room_type)
glimpse(airbnb_last_review)
Let's see the range date of reviews in these listings
# Converting last_review into date format
airbnb_last_review = airbnb_last_review %>%
mutate(last_review = parse_date_time(last_review, orders = "B d y"))
# Getting the dates of the earliest and most recent reviews
earliest_review = min(airbnb_last_review$last_review)
recent_review = max(airbnb_last_review$last_review)
earliest_review
recent_review
Now, let's see the types of room listed
airbnb_room_type %>%
count(room_type)
Oh no! The categories are quite messy so let's tidy them up and standaridize them a little.
# First make letter case consistent
airbnb_room_type = airbnb_room_type %>%
mutate(room_type = str_to_lower(room_type))
Now, let's see if this was enough
# Let's see how much it improved
airbnb_room_type %>%
count(room_type)
Great! It seems to be working. Now, let's get private rooms only
private_rooms = sum(airbnb_room_type$room_type == "private room")
private_rooms
Now, let's explore the average listing price, but first, we should put the variable in an appropriate format.
# Separate the column into two: numeric_price and unit
airbnb_price = airbnb_price %>%
separate(col = price, into = c("numeric_price", "unit"), sep = " ", remove = FALSE) %>%
mutate(numeric_price = as.numeric(numeric_price))
# Average price
avg_price = mean(airbnb_price$numeric_price)
avg_price