Skip to content

If the data include duplicate, we use function COUNT still include duplicates, using SELECT COUNT(DISTINCT sth) to **remove duplicates. **

OR, AND, BETWEEN are used to query multiple criteria.

OR to satisfy at least 1 condition.

WHERE + AND to satisfy ALL criterias,

WHERE + AND & WHERE + OR must be used in ()

WHERE + BETWEEN + AND for filtering values within a specified range

BETWEEN means inclusive, it contains the beginning and end values

FILTERING TEXT Filter LIKE, NOT LIKE, IN

  1. LIKE used to search for a pattern in a field

% will match 0, 1 or many characters in the text

For example: SELECT name FROM people WHERE name LIKE 'Ade%' the result will come out Adele, Adelia, Aden

_ match a single character

For example: SELECT name FROM people WHERE name LIKE 'Ev_'; The result come out: Eve, Eva etc

  1. NOT LIKE To find records that don't match the specified pattern.

WHERE....NOT LIKE....

Arithmetic

To divide to get exact result, need to add 0.

For example: 4.0/3.0 = 1.333

GROUPING DATA

To summarize data for a particular group of results.

GROUP BY is commonly used with aggregate functions to provide summary statistics.

Can combine GROUP BY with ORDER BY to group our results, make a calculation, and then order our results.

GROUP BY is always written before ORDER BY

SORTING

To put our data in a specific order.

The ORDER BY keyword is used to sort results of one or more fields. When used on its own, it is written after the FROM statement.

Add the ASC keyword to our query to clarify that we are sorting in ascending order (smallest to biggest) For ex: ORDER BY budget ASC

Can use the DESC keyword to sort the results in descending order.

Add your notes here

In SQL, we can't filter aggregate functions with WHERE clauses For ex: WHERE COUNT(title) > 10 => CANNOT

Use HAVING instead WHERE to filter based on the result of an aggregate function. For ex: HAVING COUNT(title) > 10

WHERE filters individual records while HAVING filters grouped records

Spinner
DataFrameas
df
variable
-- Add your own queries here
SELECT *
FROM cinema.reviews
LIMIT 5

Explore Datasets

Use the descriptions, films, people, reviews, and roles tables to explore the data and practice your skills!

  • Which titles in the reviews table have an IMDB score higher than 8.5?
  • Select all titles from Germany released after 2010 from the films table.
  • Calculate a count of all movies by country using the films table.

WHERE is a filtering clause to answer relevant questions, use the "" for the string in the WHERE clause.

NULL means human error or the information is unavailble or unknown.

Using WHERE sth NULL or NOT NULL to include or exlcude missing values.

If want to understand the dataset as a whole but not individual records. One way to do this is to summarize the data using SQL's aggregate functions. An aggregate function performs a calculation on several values and returns a single value.

They are: **COUNT, AVG, MIN, MAX, SUM **

Numerical data function only: AVG(), SUM() Various data types: COUNT, MIN, MAX (A-Z or Z-A)

ROUND() to round number to specified decimal place.