Airbnb Listings
This dataset consists of six files with Airbnb rental listings of six cities: Austin, Bangkok, Buenos Aires, Cape Town, Istanbul, and Melbourne. You can access them via the File menu or in the Context Panel at the top right of the screen next to Report, under Files. The data dictionary and filenames can be found at the bottom of this workbook.
Each row represents a listing with details such as coordinates, neighborhood, host id, price per night, number of reviews, and so on.
We've added some guiding questions for analyzing this exciting dataset! Feel free to make this workbook yours by adding and removing cells, or editing any of the existing cells.
Explore this dataset
Here are some ideas to get your started with your analysis...
- 🗺️ Explore: What is the distribution of prices across a city's neighborhoods? How does it change when you segment it further by room_type?
- 📊 Visualize: Create a map with a dot for each listing in a city and add a color scale based on price on the dots.
- 🔎 Analyze: How do listings that require a minimum stay of a week or longer differ from those that don't?
🔍 Scenario: Identify Trends Scross Listings Operated by Inferred Professional Hosts
This scenario helps you develop an end-to-end project for your portfolio.
Background: An international real estate firm has hired you to research professional hosting on Airbnb. These are hosts that have multiple listings, make considerable income from their listings, and often manage teams to operate their listings. Examples include property managers and hospitality business owners.
Objective: Using the data from all six cities, you'll have to infer listings by professional hosts based on the distribution of calculated_host_listings_count. The lead consultant is interested in whether you can identify trends across listings operated by inferred professional hosts, as well as an estimation of the percentage of listings on Airbnb operated by professional hosts.
You will need to prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.
You can query the pre-loaded CSV files using SQL directly. Here’s a sample query, followed by some sample Python code and outputs:
SELECT * FROM 'data/listings_austin.csv'
LIMIT 3import pandas as pd
austin = pd.read_csv("data/listings_austin.csv", index_col=0)
austin.head(100)Other cities
The file names for the other cities are listings_austin.csv, listings_bangkok.csv, listings_buenoes_aires.csv, listings_cape_town.csv, and listings_istanbul.csv. If you want data on other locations, visit the source of the dataset, InsideAirbnb, and upload it to your workspace.
Data Dictionary
| Column | Explanation |
|---|---|
| id | Airbnb's unique identifier for the listing |
| name | |
| host_id | |
| host_name | |
| neighbourhood_group | The neighbourhood group as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles. |
| neighbourhood | The neighbourhood as geocoded using the latitude and longitude against neighborhoods as defined by open or public digital shapefiles. |
| latitude | Uses the World Geodetic System (WGS84) projection for latitude and longitude. |
| longitude | Uses the World Geodetic System (WGS84) projection for latitude and longitude. |
| room_type | |
| price | daily price in local currency. Note, $ sign may be used despite locale |
| minimum_nights | minimum number of night stay for the listing (calendar rules may be different) |
| number_of_reviews | The number of reviews the listing has |
| last_review | The date of the last/newest review |
| calculated_host_listings_count | The number of listings the host has in the current scrape, in the city/region geography. |
| availability_365 | avaliability_x. The availability of the listing x days in the future as determined by the calendar. Note a listing may be available because it has been booked by a guest or blocked by the host. |
| number_of_reviews_ltm | The number of reviews the listing has (in the last 12 months) |
| license |
The data for each city was compiled by InsideAirbnb between October and November 2021.