Skip to content
Project: Exploring Airbnb Market Trends
# Import necessary packages
import pandas as pd
import numpy as np
# Begin coding here ...
# Use as many cells as you like
price = pd.read_csv('data/airbnb_price.csv')
price.head()
Hidden output
room = pd.read_excel('data/airbnb_room_type.xlsx')
room.head()
Hidden output
review = pd.read_csv('data/airbnb_last_review.tsv',delimiter='\t')
review.head()
Hidden output
Merging the three dataframe using the primary key "listing_id"
price_room = pd.merge(price, room, how='left', on = "listing_id")
price_room_review = pd.merge(price_room, review, how='left', on= "listing_id")
price_room_review.head()
Converting "last_review" to date format and Max Min dates
pd.to_datetime(price_room_review['last_review'])
min_date = pd.to_datetime(price_room_review['last_review']).min()
max_date = pd.to_datetime(price_room_review['last_review']).max()
Cleaning the data to get a reliable count: standardizing capitalization in string data to all lower case
room_type = price_room_review['room_type'].str.lower() == 'private room'
room_type.unique()
Hidden output
count = price_room_review[price_room_review['room_type'].str.lower() == 'private room'].shape[0]
price_float = price_room_review['price'].str.replace('dollars', '').astype(float)
avg = price_float.mean()
review_dates = pd.DataFrame({
'first_reviewed': [min_date],
'last_reviewed': [max_date],
'nb_private_rooms': [count],
'avg_price': [round(avg, 2)]
})
print (review_dates)