Skip to content
SQL for Absolute Beginners
  • AI Chat
  • Code
  • Report
  • Live Code-Along: SQL for Absolute Beginners

    Key session takeaways

    • Learn the basics of SQL & why it's foundational for data science work.
    • Filter, group by, and analyse datasets using SQL.
    • Answer key business questions with SQL

    The Dataset

    The dataset to be used in this training is a CSV file named airbnb_data.csv, which contains data on airbnb listings in the state of New York. It contains the following columns:

    • listing_id: The unique identifier for a listing
    • description: The description used on the listing
    • host_id: Unique identifier for a host
    • neighbourhood_full: Name of boroughs and neighbourhoods
    • coordinates: Coordinates of listing (latitude, longitude)
    • listing_added: Date of added listing
    • room_type: Type of room
    • rating: Rating from 0 to 5.
    • price: Price per night for listing
    • number_of_reviews: Amount of reviews received
    • reviews_per_month: Number of reviews per month
    • availability_365: Number of days available per year
    • number_of_stays: Total number of stays thus far

    Questions to answer

    • Question 1: List the top 10 most reviewed private rooms
    • Question 2: What are the cheapest 10 private rooms in New York?
    • Question 3: What is the average availability of a private room in New York?

    Q&A

    • Question 4: Which listings have an availability of fewer than 30 days a year but have fewer than 10 reviews?
    • Question 5: What is the average number of reviews per room type, ordered by the average in descending order?
    • Question 6: What is the number and average price of listings by room type where such listings are available for more than 250 days a year?

    Some SQL Basics

    SQL Commands

    Before we dive into answering our questions, let's get familiar with SQL syntax. Today, we'll be covering the most useful SQL commands — let's start with them one by one. This includes the following:

    • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
    • FROM : specifies the table that the data should be returned from.
    • LIMIT: limits the number of rows returned.
    • AS: lets you set an alias for a particular column
    • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
    • WHERE: lets you filter on a specific column or value
    • GROUP BY: lets you aggregate on one or more columns
    • MAX(): returns the maximum amount of a particular column
    • AVG(): returns the total average of a particular column

    Code Commenting

    There are two types of code commenting in Postgres

    -- Inline commenting : Used for quick, short notes

    "/" Multi line commenting "/" (use without quotation marks): Used for longer comments, such as metadata, or code headers including the author, date, purpose, etc.

    Unknown integration
    DataFrameavailable as
    df9
    variable
    Select *
    from 'airbnb_data.csv'
    limit 10;
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    Select listing_id, description
    from 'airbnb_data.csv'
    limit 10
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df1
    variable
    Select * 
    from 'airbnb_data.csv'
    order by number_of_stays ASC
    limit 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df2
    variable
    Select * 
    from 'airbnb_data.csv'
    where room_type = 'Private Room'
    limit 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df3
    variable
    Select AVG(price) AS 'Average Price'
    from 'airbnb_data.csv'
    
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df4
    variable
    Select room_type AS 'Room Type', AVG(price) AS 'Average Price'
    from 'airbnb_data.csv'
    group by room_type
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Q&A

    Let's answer some questions!

    Question 1: List the top 10 most reviewed private rooms

    In order to answer this question, we need to filter on room_type and order by reviews_per_month, and limit to the top 10 results. Here are the commands we need to use:

    • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
    • FROM : specifies the table that the data should be returned from.
    • WHERE: specifies a condition on a column
    • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
    • LIMIT: returns the specified number of rows
    Unknown integration
    DataFrameavailable as
    df5
    variable
    select *
    from 'airbnb_data.csv'
    where room_type == 'Private Room'
    order by number_of_reviews DESC
    limit 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Hidden code

    Question 2: What are the cheapest 10 private rooms in New York?

    In order to answer this question, we need to filter on room_type and order by price, and limit to the top 10 results. Here are the commands we need to use:

    • SELECT: returns either all columns using * or specific columns as specified, seperated by a comma.
    • FROM : specifies the table that the data should be returned from.
    • WHERE: specifies a condition on a column
    • ORDER BY: returns the data sorted by column specified. Can be sorted in ASC (ascending) or DESC (descending).
    • LIMIT: returns the specified number of rows
    Unknown integration
    DataFrameavailable as
    df15
    variable
    select *
    from 'airbnb_data.csv'
    where room_type == 'Private Room'
    order by price ASC
    limit 10
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.