Skip to content

Introduction to SQL

Here you can access the books table used in the course.

Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

Data type: VARCHAR (文字/String), INT (整數/Integers), NUMERIC (有小數點後數字嘅數/Float) > 可以係SCHEMA睇佢屬於咩Data type(類似SPSS Variable View)

Add your notes here

Spinner
DataFrameas
books
variable
-- Add your own queries here
SELECT (個別想揀選嘅資料類/table column name) 
FROM books;

(如想揀選多項資料就用,分隔)

SELECT a, b
FROM books;

(如想揀選所有資料就用*)

SELECT *
FROM books;

(如果想揀選某項名嘅資料又想同時係結果入面改名) (詳情見Aliasing)

SELECT a AS b
FROM books;

(如果想結果唔重複出現,譬如有好多人揀2但你只需要知有人揀2)(可同時揀選多項資料)

SELECT DISTINCT a
FROM books;

(如想整成現成表格)

CREATE VIEW (表格名) AS
SELECT (需要資料名,可包括多項)
FROM books;

(如果想限制顯示結果量)

SELECT a
FROM books
LIMIT (想睇幾多結果嘅數字);

(如果想統計有幾多value)

SELECT COUNT a
FROM books;

COUNT _(名) 只會包括non-missing value
但*就會包括埋missing values

(如果需要統計唔同資料項目)

SELECT COUNT a, COUNT b
FROM books;

(如果想統計有幾多筆資料)

SELECT COUNT(*)
FROM books;

(連埋DISTINCT用可以計有幾多Value而同時又排除咗唔同人但又相同嘅value)

SELECT COUNT(DISTINCT a)
FROM books;

WHERE (用作Filter clause) (係FROM之後lIMIT之前)

例如:

SELECT title
FROM films
WHERE release_year > 1960;

結果:只顯示1960年之後出產嘅電影名

可以改成如 > (大於) < (細於) <=(細於等於) = (等於) <> (排除) (數字上)
亦可改成如 WHERE country = 'Japan' > 只顯示日本產電影

OR AND BETWEEN: 多種Filter criteria

每個Field都要寫清楚, 例如 WHERE release_year = 1994 OR release_year = 2000,而唔係WHERE release_yar = 1994 OR 2000;例如WHERE release_year > 1994 AND release_year < 2000,而唔係WHERE release_year > 1994 AND < 2000

而BETWEEN(包括埋頭同尾兩年)

SELECT title
FROM films
WHERE release_year 
	BETWEEN 1994 AND 2000 AND country = 'UK';

可以Combine唔同Filter,例如:(下面縮排非必須,但係習慣)

SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
	AND (certification = 'PG' OR certification = 'R');
    
Word Filter: LIKE, NOT LIKE, IN

LIKE: 搵有呢個字嘅資料,例如

SELECT name
FROM people
Where name LIKE 'Ade%' 或 WHERE name LIKE '_';

%包括所有一或多個字元(如Adel, Aden); _只會包括一個字元(如Eve);擺放符號嘅位置唔同會顯示嘅結果唔同亦可一齊用,例如:

SELECT name
FROM people
Where name LIKE '%r'; > 會顯示所有以r為文字結尾嘅資料

SELECT name
FROM people
Where name LIKE '__t%' > 會顯示第三個字係t,後面唔限字元長短嘅資料

NOT LIKE: 顯示排除某字詞嘅資料,例如

SELECT name
FROM people
Where name NOT LIKE 'A.%' 

然後排除結果有'A.'嘅資料

前面提過嘅OR可以用去揀選好多符合要求嘅資料,但亦可以用IN去簡化佢,例如:

SELECT title
FROM films
WHERE release_year IN (1920, 1930, 1940); 

效果等同於WHERE release_year = 1920 OR release_year = 1930 OR release_year = 1940

如果IN要用作Filter字眼,記得字眼要加''

睇有咩資料項有Missing DATA,例如

WHERE birthdate IS NULL: > 顯示birthdate係NULL嘅DATA

如果想只計算冇Missing Data嘅可用NOT NULL,例如:

SELECT COUNT(name) AS count_birthdates
FROM people
WHERE birthdate IS NOT NULL; > 只計算有Birthdate資訊嘅資料

>兩者用途一樣,睇自己想點寫

Aggregate function:計算用

AVG():平均值,只能用於Numerical field
SUM():總和,只能用於Numerical field
MIN():最小數值/最低/A/最細年份,可用於唔同field
MAX():最大數值/最高/Z/最大年份,可用於唔同field
COUNT().可用於唔同field

> 應用時可以加上Alising(方便記住寫code時知係咩數嚟)以及WHERE

Where例:

SELECT AVG(budget) AS avg_budget
FROM films
WHERE release_year >=2010;

如想去除小數點:ROUND() > ()包括兩樣野:ROUND(number_to_round, decimal_places), 前者係想要round嘅數字, 後者係幾多小數點 > 後者可唔寫, 咁嘅話就會取整數而冇小數點 > 可寫成-_形式, 例如

SELECT ROUND(AVG(budget), -5) AS avg_budget
FROM files
WHERE release_year >= 2010;
結果會取最接近嘅十萬, 後面跟五個0

算式符號/Arithmetic:+ - * / 
例如 SELECT (4 + 3); 
結果會出7
除整數只會出整數/Integer > 如想更準確:
SELECT (4.0 / 3.0);
結果會出1.33333...

ORDER BY:Sort結果次序,by default由細到大,例如:
SELECT title, budget
FROM films
ORDER BY budget;
如果要強調sorting order, 可以加ASC(細到大)/DESC(大到細)係ORDER BY _ 尾
如果要排除某啲冇用嘅RESULT例如null,可以加行WHERE係中間,例如:
SELECT title, budget
FROM films
WHERE budget IS NOT NUll
ORDER BY budget DESC/ASC;

ORDER BY可同時做多個field嘅sorting,例如:
ORDER BY field_one, field_two > 呢個時候如第一個field冇法判定order就會由第二個field去判定 > 但sorting時兩個field都要寫sorting preference,例如:
ORDER BY wins DESC, imdb_score DESC;
>如第一個field冇寫DESC就會by default變成ASC
ORDER BY後面可以再加LIMIT去限制顯示結果

GROUP BY:幫DATA分類(似SPSS frequency),例如:
SELECT certification, COUNT (title) AS title_count
FROM films
GROUP BY certification;
>結果顯示唔同certification中有幾多齣戲
GROUP BY亦可以同時做多個field嘅organising,只需要GROUP BY certification, language; 之類

HAVING: 係WHERE想加filter時/想filter grouped records時用嚟代替嘅command,例如
SELECT release_year, COUNT(title) AS title_count
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10; > 只顯示title_count大於10者

INNER JOIN: 匯入另一表格資料至現有表格,例如:
SELECT prime_ministers.country, prime_ministers.continent, prime_minister, president
FROM prime_ministers
INNER JOIN presidents
ON prime_ministers.country = presidents.country;
原本:分開兩個表格顯示同一國家有唔同PM及總統>之後:兩個表格合一
.意義:前面係table名,後面係欄目名,如果兩個表格都有同一欄目就要用,唔係會出現Error
做練習時要注意如果用咗alias,咁要寫其他code時亦要用返Alias去避免錯
另外From嘅Alias會適用於SELECT上, 所以都建議寫FROM再打落然後寫返SELECT等等

可以用AS function,例如:
SELECT p1.country, p1.continent, prime_minister, president
FROM prime_ministers AS p1
INNER JOIN presidents AS p2
ON p1.country = p2.country; (或用USING去指明兩個表格都用緊一樣嘅資料)

LEFT JOIN / RIGHT JOIN:用作取代INNER JOIN
兩個Table一左一右之下,LEFT JOIN後兩TABLE合併, 左邊TABLE先行再按照左邊次序咁顯示(以左為本)
如左右TABLE都有同樣數據就會一齊顯示,如右TABLE冇就會係合併後右邊顯示null
記住有ON或USING去連結兩個表格一樣嘅資料
RIGHT JOIN就同LEFT JOIN相反

FULL JOIN: LEFT及RIGHT JOIN合體

注意LEFT JOIN / RIGHT JOIN 一樣, 但佢地兩者同INNER JOIN及FULL JOIN會顯示唔同結果

如果想加兩個表格就要寫兩個JOIN, USING之類嘅指令要係每個JOIN後兩各寫一次

CROSS JOIN: 將兩個表格所有組合都列出, 唔需要加USING之類嘅指令

SELF JOIN: 同一表格列出唔同資料對比, 但要用Alias, 例如:
SELECT 
	p1.country AS country1,
	p2.country AS country2,
	p1.continent
FROM prime_ministers AS p1
INNER JOIN prime_ministers AS p2
ON p1.continent = p2.continent <--USING個類嘅function
	AND p1.country <> p2.country <--避免出現同一國家
	
Joining Data: 4種
UNION (所有但唔重覆)
UNION ALL (所有且重覆)
INTERSECT (兩種DATA交匯/共有嘅)
EXCEPT (排除另一邊一樣有嘅DATA)

位置係SELECT及FROM後,例如:
SELECT *
FROM left_table
UNION / UNINO ALL / INTERSECT
SELECT * 
FROM right table
注意Join data時兩者要同一樣嘅Field,有一樣COLUMN先可以

INTERSECT比較INNER JOIN
INTERSECT
SELECT *
FROM left_table
INTERSECT
SELECT *
FROM right_table
>結果: 只顯示兩個表格中有一致嘅資料

INNER JOIN
SELECT left_table 
INNER JOIN right_table
ON left_id = right.id
	AND left_val = right.val
>結果: 如果其中一個表格有兩筆,一個表格有一筆, 咁合併後就都會有兩筆

EXCEPT時,兩個表格就算有一格資料一樣,只要另一格資料唔同都會被exclude

Explore Datasets

Use the books table to explore the data and practice your skills!

  • Select only the title column.
  • Alias the title column as book_title.
  • Select the distinct author names from the author column.
  • Select all records from the table and limit your results to 10.