Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Import necessary packages
import pandas as pd
import numpy as np
# Begin coding here ...
# Use as many cells as you like
#abrir arquivo em xlsx
import pandas as pd
file='data/airbnb_room_type.xlsx'
dataexcel=pd.read_excel(file)
print(dataexcel.head())
#abrir arquivo em csv
file2='data/airbnb_price.csv'
datacsv=pd.read_csv(file2)
print(datacsv.head())
#abrir arquivo em tsv
file3='data/airbnb_last_review.tsv'
datatsv=pd.read_csv(file3,sep='\t')
print(datatsv.head())
#verificação se todas as bases só tem ids únicos
print(len(datatsv['listing_id'].unique()))
print(datatsv.shape[0])
print(len(datacsv['listing_id'].unique()))
print(datacsv.shape[0])
print(len(dataexcel['listing_id'].unique()))
print(dataexcel.shape[0])
#mergeando todas as bases
data=datatsv.merge(datacsv,on='listing_id')
data=data.merge(dataexcel,on='listing_id')
#print(data.head())
#print(data.shape[0])
#print(len(data['listing_id'].unique()))
#1. descobrindo a data mais e menos recente das reviews
data['last_review']=pd.to_datetime(data['last_review'])
data2=data.sort_values(by='last_review').reset_index()
r1=data2['last_review'].iloc[0]
r2=data2['last_review'].iloc[-1]
#quantidade de private rooms
room=data['room_type'].str.lower()
a=room.value_counts()
r3=a.iloc[1]
#media de preço
b=data['price'].str.strip(' dollars').astype(int)
r4=b.mean().round(2)
#jogar tudo num dataframe
valores=[r1,r2,r3,r4]
review_dates=pd.DataFrame([valores],columns=['first_reviewed','last_reviewed','nb_private_rooms','avg_price'])
print(review_dates)
#data da review mais recente e da menos recente
#print(data.info())
#print(data2['last_review'])
#print(data['last_review'].value_counts())
#print(data2['last_review'].head(1))
#print(data2['last_review'].tail(1))
#print(data2['last_review'].iloc[-1])
#print(r1)
#print(r2)
#print(data['room_type'].value_counts())
#print(room.value_counts())