Skip to content
# Import necessary packages
import pandas as pd
import numpy as np
# Begin coding here ...
# Use as many cells as you like
price = pd.read_csv('data/airbnb_price.csv')
price.head()
Hidden output
room = pd.read_excel('data/airbnb_room_type.xlsx')
room.head()
Hidden output
review = pd.read_csv('data/airbnb_last_review.tsv',delimiter='\t')
review.head()
Hidden output

Merging the three dataframe using the primary key "listing_id"

price_room = pd.merge(price, room, how='left', on = "listing_id")
price_room_review = pd.merge(price_room, review, how='left', on= "listing_id")
price_room_review.head()

Converting "last_review" to date format and Max Min dates

pd.to_datetime(price_room_review['last_review'])
min_date = pd.to_datetime(price_room_review['last_review']).min()
max_date = pd.to_datetime(price_room_review['last_review']).max()

Cleaning the data to get a reliable count: standardizing capitalization in string data to all lower case

room_type = price_room_review['room_type'].str.lower() == 'private room'
room_type.unique()
Hidden output
count = price_room_review[price_room_review['room_type'].str.lower() == 'private room'].shape[0]
price_float = price_room_review['price'].str.replace('dollars', '').astype(float)
avg = price_float.mean()
review_dates = pd.DataFrame({ 
    'first_reviewed': [min_date],
    'last_reviewed': [max_date],
    'nb_private_rooms': [count],
    'avg_price': [round(avg, 2)]
})
print (review_dates)