Skip to content
Begineer SQL Analysis of AirBnB Listings
A Beginner's Data Analysis with SQL
The Dataset
The dataset to be used in this training is a CSV file named airbnb_data.csv, which contains data on airbnb listings in the state of New York. It contains the following columns:
listing_id: The unique identifier for a listingdescription: The description used on the listinghost_id: Unique identifier for a hostneighbourhood_full: Name of boroughs and neighbourhoodscoordinates: Coordinates of listing (latitude, longitude)listing_added: Date of added listingroom_type: Type of roomrating: Rating from 0 to 5.price: Price per night for listingnumber_of_reviews: Amount of reviews receivedreviews_per_month: Number of reviews per monthavailability_365: Number of days available per yearnumber_of_stays: Total number of stays thus far
Questions to answer
- Question 1: List the top 10 most reviewed private rooms
- Question 2: What are the cheapest 10 private rooms in New York?
- Question 3: What is the average availability of a private room in New York?
Q&A
- Question 4: Which listings have an availability of fewer than 30 days a year but have fewer than 10 reviews?
- Question 5: What is the average number of reviews per room type, ordered by the average in descending order?
- Question 6: What is the number and average price of listings by room type where such listings are available for more than 250 days a year?
Let's Practice Some SQL!
DataFrameas
df
variable
-- SELECT the first 10 rows of all the columns from the airbnb
SELECT *
FROM 'airbnb_data.csv'
LIMIT 10;DataFrameas
df1
variable
-- SELECT the first 10 rows of the listing_id, description, and neighbourhood_full columns
SELECT listing_id, description, neighbourhood_full
FROM 'airbnb_data.csv'
LIMIT 10; DataFrameas
df2
variable
-- Set an alias for listing_id, description and neighbourhood_full
SELECT listing_id AS 'Listing ID',
description AS 'Description',
neighbourhood_full AS 'Full Neighbourhood'
FROM 'airbnb_data.csv'
LIMIT 10; DataFrameas
df3
variable
-- Sort the output by number of stays
SELECT *
FROM 'airbnb_data.csv'
ORDER BY number_of_stays
LIMIT 10; DataFrameas
df4
variable
-- Filter room type by private room
SELECT *
FROM 'airbnb_data.csv'
WHERE room_type = 'Private Room'
LIMIT 10; DataFrameas
df5
variable
-- Get the average price for all rooms
SELECT AVG(price) AS 'Average Price'
FROM 'airbnb_data.csv';DataFrameas
df9
variable
-- Get the average price per room type
SELECT room_type, AVG(price) AS 'Average Price'
FROM 'airbnb_data.csv'
GROUP BY room_type;Q&A
Let's answer some questions!
Question 1: List the top 10 most reviewed private rooms
In order to answer this question, we need to filter on room_type and order by reviews_per_month, and limit to the top 10 results. Here are the commands we need to use:
SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.FROM: specifies the table that the data should be returned from.WHERE: specifies a condition on a columnORDER BY: returns the data sorted by column specified. Can be sorted inASC(ascending) orDESC(descending).LIMIT: returns the specified number of rows
DataFrameas
df10
variable
-- List the top 10 most reviewed private rooms
SELECT *
FROM 'airbnb_data.csv'
WHERE room_type == 'Private Room'
ORDER BY reviews_per_month DESC
LIMIT 10; Question 2: What are the cheapest 10 private rooms in New York?
In order to answer this question, we need to filter on room_type and order by price, and limit to the top 10 results. Here are the commands we need to use:
SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.FROM: specifies the table that the data should be returned from.WHERE: specifies a condition on a columnORDER BY: returns the data sorted by column specified. Can be sorted inASC(ascending) orDESC(descending).LIMIT: returns the specified number of rows
DataFrameas
df11
variable
-- List the top 10 cheapest private rooms in New York
SELECT *
FROM 'airbnb_data.csv'
WHERE room_type == 'Private Room'
ORDER BY price
LIMIT 10; Question 3: What is the average availability of a private room in New York?
In order to answer this question, we need to filter on room_type and compute the average of availability_365. Here are the commands we need to use:
SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.FROM: specifies the table that the data should be returned from.WHERE: specifies a condition on a columnAVG: returns the average of a column