Live Code-Along: SQL for Absolute Beginners
Key session takeaways
- Learn the basics of SQL & why it's foundational for data science work.
- Filter, group by, and analyse datasets using SQL.
- Answer key business questions with SQL
The Dataset
The dataset to be used in this training is a CSV file named airbnb_data.csv
, which contains data on airbnb listings in the state of New York. It contains the following columns:
listing_id
: The unique identifier for a listingdescription
: The description used on the listinghost_id
: Unique identifier for a hostneighbourhood_full
: Name of boroughs and neighbourhoodscoordinates
: Coordinates of listing (latitude, longitude)listing_added
: Date of added listingroom_type
: Type of roomrating
: Rating from 0 to 5.price
: Price per night for listingnumber_of_reviews
: Amount of reviews receivedreviews_per_month
: Number of reviews per monthavailability_365
: Number of days available per yearnumber_of_stays
: Total number of stays thus far
Questions to answer
- Question 1: List the top 10 most reviewed private rooms
- Question 2: What are the cheapest 10 private rooms in New York?
- Question 3: What is the average availability of a private room in New York?
Q&A
- Question 4: Which listings have an availability of fewer than 30 days a year but have fewer than 10 reviews?
- Question 5: What is the average number of reviews per room type, ordered by the average in descending order?
- Question 6: What is the number and average price of listings by room type where such listings are available for more than 250 days a year?
Some SQL Basics
SQL Commands
Before we dive into answering our questions, let's get familiar with SQL syntax. Today, we'll be covering the most useful SQL commands — let's start with them one by one. This includes the following:
SELECT
: returns either all columns using * or specific columns as specified, seperated by a comma.FROM
: specifies the table that the data should be returned from.LIMIT
: limits the number of rows returned.AS
: lets you set an alias for a particular columnORDER BY
: returns the data sorted by column specified. Can be sorted inASC
(ascending) orDESC
(descending).WHERE
: lets you filter on a specific column or valueGROUP BY
: lets you aggregate on one or more columnsMAX()
: returns the maximum amount of a particular columnAVG()
: returns the total average of a particular column
Code Commenting
There are two types of code commenting in Postgres
-- Inline commenting
: Used for quick, short notes
"/" Multi line commenting
"/" (use without quotation marks): Used for longer comments, such as metadata, or code headers including the author, date, purpose, etc.
-- SELECT the first 10 rows of all the columns from the airbnb dataset
SELECT *
FROM 'airbnb_data.csv'
LIMIT 10;
-- SELECT the first 10 rows of the listing_id, description, and neighbourhood_full columns
SELECT listing_id, description, neighbourhood_full
FROM airbnb_data.csv
LIMIT 10;
-- Set an alias for listing_id, description and neighbourhood_full
SELECT listing_id AS Listing_Id, description AS Desc, neighbourhood_full AS Location
FROM airbnb_data.csv;
-- Sort the output by number of stays
SELECT *
FROM airbnb_data.csv
ORDER BY number_of_stays DESC;
-- Filter room type by private room
SELECT *
FROM airbnb_data.csv
WHERE room_type = 'Private Room';
-- Get the average price for all rooms
SELECT AVG(price) AS Avg_Price
FROM airbnb_data.csv;
-- Get the average price per room type
SELECT room_type, AVG(price) As avg_price
FROM airbnb_data.csv
GROUP BY 1;
Q&A
Let's answer some questions!
Question 1: List the top 10 most reviewed private rooms
In order to answer this question, we need to filter on room_type
and order by reviews_per_month
, and limit to the top 10 results. Here are the commands we need to use:
SELECT
: returns either all columns using * or specific columns as specified, seperated by a comma.FROM
: specifies the table that the data should be returned from.WHERE
: specifies a condition on a columnORDER BY
: returns the data sorted by column specified. Can be sorted inASC
(ascending) orDESC
(descending).LIMIT
: returns the specified number of rows
-- List the top 10 most reviewed private rooms
SELECT *
FROM airbnb_data.csv
WHERE room_type = 'Private Room'
ORDER BY reviews_per_month DESC
LIMIT 10;
Question 2: What are the cheapest 10 private rooms in New York?
In order to answer this question, we need to filter on room_type
and order by price
, and limit to the top 10 results. Here are the commands we need to use:
SELECT
: returns either all columns using * or specific columns as specified, seperated by a comma.FROM
: specifies the table that the data should be returned from.WHERE
: specifies a condition on a columnORDER BY
: returns the data sorted by column specified. Can be sorted inASC
(ascending) orDESC
(descending).LIMIT
: returns the specified number of rows
-- List the top 10 cheapest private rooms in New York
SELECT *
FROM airbnb_data.csv
WHERE room_type = 'Private Room'
ORDER BY price ASC
LIMIT 10;