Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like .csv
, .tsv
, and .xlsx
.
Recall that CSV, TSV, and Excel files are three common formats for storing data. Three files containing data on 2019 Airbnb listings are available to you:
data/airbnb_price.csv This is a CSV file containing data on Airbnb listing prices and locations.
listing_id
: unique identifier of listingprice
: nightly listing price in USDnbhood_full
: name of borough and neighborhood where listing is located
data/airbnb_room_type.xlsx This is an Excel file containing data on Airbnb listing descriptions and room types.
listing_id
: unique identifier of listingdescription
: listing descriptionroom_type
: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments
data/airbnb_last_review.tsv This is a TSV file containing data on Airbnb host names and review dates.
listing_id
: unique identifier of listinghost_name
: name of listing hostlast_review
: date when the listing was last reviewed
# Import necessary packages
import pandas as pd
import numpy as np
from datetime import datetime
# Begin coding here ...
# Use as many cells as you like
#Carregar arquivos
prices = pd.read_csv('data/airbnb_price.csv')
room_types = pd.read_excel('data/airbnb_room_type.xlsx')
review_date = pd.read_csv('data/airbnb_last_review.tsv',sep="\t")
############### What are the dates of the earliest and most recent reviews? ###############
############### Store these values as two separate variables with your preferred names. ###############
#Estabelecer variavel com a data de hoje, deixar no formato da tabela review_date e transformar a variavel em datetime
today = datetime.now()
today = today.strftime("%B %d %Y")
today = pd.to_datetime(today, format = "%B %d %Y")
#Converter coluna last_review em formato datetime
review_date['last_review'] = pd.to_datetime(review_date['last_review'], format = "%B %d %Y")
#Ordenar valores para verificar o review mais antigo e o mais recente. Argumento ignore index restaura os index na nova ordem estabelecida
review_date = review_date.sort_values(by = 'last_review',ascending = True, ignore_index = True)
earliest_review = review_date.loc[0,'last_review']
latest_review = review_date.loc[review_date.shape[0]-1,'last_review']
#################### How many of the listings are private rooms? Save this into any variable. ####################
#Tratar a coluna 'room_type' da tabela room_types para que lower case e upper case values nao sejam afetados
room_types['room_type'] = room_types['room_type'].str.lower()
#Contar valores de cada tipo de quarto
# room_types['room_type'].value_counts() # para saber contagem de cada tipo de quarto
room_types.loc[room_types['room_type'] == 'private room','room_type'].value_counts() # contagem exata dos private rooms
private_rooms = room_types.loc[room_types['room_type'] == 'private room','room_type'].value_counts().values[0]
#################### What is the average listing price?
#################### Round to the nearest two decimal places and save into a variable ####################
#Alterar coluna price da tabela prices para que seja int
prices['price'] = prices['price'].str.strip(' dollars')
prices['price'] = prices['price'].astype('int')
#Calcular o ticket medio dos precos
ticket_medio = round(prices['price'].mean(),2)
##################### Combine the new variables into one DataFrame called review_dates with four columns in the following order: ##################### first_reviewed, last_reviewed, nb_private_rooms, and avg_price.
##################### The DataFrame should only contain one row of values.
review_dates = [[earliest_review,latest_review,private_rooms,ticket_medio]]
review_dates = pd.DataFrame(review_dates, columns = ['first_reviewed','last_reviewed','nb_private_rooms','avg_price'])
################################################ TEST SESSION ################################################
print(prices.head())
print(room_types.head())
print(review_date.tail())
print(review_dates)