Skip to content
New Workbook
Sign up
SQL for Absolute Beginners

Live Code-Along: SQL for Absolute Beginners

Key session takeaways

  • Learn the basics of SQL & why it's foundational for data science work.
  • Filter, group by, and analyse datasets using SQL.
  • Answer key business questions with SQL

The Dataset

The dataset to be used in this training is a CSV file named airbnb_data.csv, which contains data on airbnb listings in the state of New York. It contains the following columns:

  • listing_id: The unique identifier for a listing
  • description: The description used on the listing
  • host_id: Unique identifier for a host
  • neighbourhood_full: Name of boroughs and neighbourhoods
  • coordinates: Coordinates of listing (latitude, longitude)
  • listing_added: Date of added listing
  • room_type: Type of room
  • rating: Rating from 0 to 5.
  • price: Price per night for listing
  • number_of_reviews: Amount of reviews received
  • reviews_per_month: Number of reviews per month
  • availability_365: Number of days available per year
  • number_of_stays: Total number of stays thus far

Questions to answer

  • Question 1: List the top 10 most reviewed private rooms
  • Question 2: What are the cheapest 10 private rooms in New York?
  • Question 3: What is the average availability of a private room in New York?

Q&A

  • Question 4: Which listings have an availability of fewer than 30 days a year but have fewer than 10 reviews?
  • Question 5: What is the average number of reviews per room type, ordered by the average in descending order?
  • Question 6: What is the number and average price of listings by room type where such listings are available for more than 250 days a year?

Some SQL Basics

SQL Commands

Before we dive into answering our questions, let's get familiar with SQL syntax. Today, we'll be covering the most useful SQL commands — let's start with them one by one. This includes the following:

  • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
  • FROM : specifies the table that the data should be returned from.
  • LIMIT: limits the number of rows returned.
  • AS: lets you set an alias for a particular column
  • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
  • WHERE: lets you filter on a specific column or value
  • GROUP BY: lets you aggregate on one or more columns
  • MAX(): returns the maximum amount of a particular column
  • AVG(): returns the total average of a particular column

Code Commenting

There are two types of code commenting in Postgres

-- Inline commenting : Used for quick, short notes

"/" Multi line commenting "/" (use without quotation marks): Used for longer comments, such as metadata, or code headers including the author, date, purpose, etc.

Spinner
DataFrameavailable as
df
variable
-- SELECT the first 10 rows of all the columns from the airbnb dataset
SELECT *
FROM 'airbnb_data.csv'
LIMIT 10;
Spinner
DataFrameavailable as
df1
variable
-- SELECT the first 10 rows of the listing_id, description, and neighbourhood_full columns
SELECT  listing_id, description, neighbourhood_full
FROM airbnb_data.csv
LIMIT 10;
Spinner
DataFrameavailable as
df2
variable
-- Set an alias for listing_id, description and neighbourhood_full
SELECT listing_id AS Listing_Id, description AS Desc, neighbourhood_full AS Location
FROM airbnb_data.csv;
Spinner
DataFrameavailable as
df3
variable
-- Sort the output by number of stays
SELECT *
FROM airbnb_data.csv
ORDER BY number_of_stays DESC;
Spinner
DataFrameavailable as
df4
variable
-- Filter room type by private room
SELECT *
FROM airbnb_data.csv
WHERE room_type = 'Private Room';
Spinner
DataFrameavailable as
df5
variable
-- Get the average price for all rooms
SELECT AVG(price) AS Avg_Price
FROM airbnb_data.csv;
Spinner
DataFrameavailable as
df
variable
-- Get the average price per room type
SELECT room_type, AVG(price) As avg_price
FROM airbnb_data.csv
GROUP BY 1;

Q&A

Let's answer some questions!

Question 1: List the top 10 most reviewed private rooms

In order to answer this question, we need to filter on room_type and order by reviews_per_month, and limit to the top 10 results. Here are the commands we need to use:

  • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
  • FROM : specifies the table that the data should be returned from.
  • WHERE: specifies a condition on a column
  • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
  • LIMIT: returns the specified number of rows
Spinner
DataFrameavailable as
df10
variable
-- List the top 10 most reviewed private rooms
SELECT *
FROM airbnb_data.csv
WHERE room_type = 'Private Room'
ORDER BY reviews_per_month DESC
LIMIT 10;
Hidden code

Question 2: What are the cheapest 10 private rooms in New York?

In order to answer this question, we need to filter on room_type and order by price, and limit to the top 10 results. Here are the commands we need to use:

  • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
  • FROM : specifies the table that the data should be returned from.
  • WHERE: specifies a condition on a column
  • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
  • LIMIT: returns the specified number of rows
Spinner
DataFrameavailable as
df11
variable
-- List the top 10 cheapest private rooms in New York
SELECT *
FROM airbnb_data.csv
WHERE room_type = 'Private Room'
ORDER BY price ASC
LIMIT 10;