Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# We've loaded your first package for you! You can add as many cells as you need.
import numpy as np
# Begin coding here ...
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load in the price data
price = pd.read_csv('data/airbnb_price.csv')
price.head()
# Load in room_type data
room_type = pd.read_excel('data/airbnb_room_type.xlsx')
room_type.head()
# Load in review data
reviews = pd.read_csv('data/airbnb_last_review.tsv', delimiter="\t")
reviews.head()
# What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names.
pd.to_datetime(reviews["last_review"]).agg(["min", "max"])
earliest_review = pd.to_datetime('2019-01-01')
latest_review = pd.to_datetime('2019-07-09')
# How many of the listings are private rooms? Save this into any variable.
room_type["room_type"] = room_type["room_type"].str.title().str.strip()
private_rooms = room_type.query("room_type == 'Private Room'")["room_type"].count()
private_rooms
# What is the average listing price? Round to the nearest penny and save into a variable.
price["price"] = price["price"].str.replace("dollars", "")
price["price"] = price["price"].astype("float")
average_price = round(price["price"].mean(), 2)
average_price
# Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price. The DataFrame should only contain one row of values.
review_dates = pd.DataFrame(columns=["first_reviewed", "last_reviewed", "nb_private_rooms", "avg_price"])
review_dates.loc[0] = [earliest_review, latest_review, private_rooms, average_price]
review_dates