Skip to content

PostgreSQL Summary Stats and Window Functions Note

視窗函數

章節-Fetching, ranking, and paging

這裡筆記中的內容,是依照PostgreSLQ 為主設計的
目前介紹的函數有

  • LAG(參照欄位,格數) :以相對位置往前=PRECEDING幾格,並將參數置入於刮弧中。 用法如下
Spinner
DataFrameas
df
variable
SELECT 
	f.title,
    r.imdb_score,
    LAG(r.imdb_score,1) OVER(ORDER BY r.imdb_score DESC) --前一個分數較高的作品   
FROM cinema.reviews AS r
JOIN cinema.films AS f
ON r.id = f.id
WHERE r.imdb_score IS NOT NULL -- 有未被評分的電影
ORDER BY r.imdb_score DESC
LIMIT 10;
  • LEAD() :以相對位置往後=FOLLOWINGP幾格,並將參數置入於刮弧中。
Spinner
DataFrameas
df
variable
  • FIRSR_VALUE() :取表頭的第一個值。
  • LAST_VALUE :(ORDER BY OXOX RANG EBETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING):取表格的最後一個值,並且要加上後續的補充參數,確保整張表格都被包含。
  • ROW_NUMBERS():用來依照列數,給與對應的數字。
  • RANK():依序排名,佔缺額 i.e. 12345名,234同分,則排名為12225。
  • DENSE_RANK():連續號碼排名 i.e. 12345,234同分,則排名為12223。
  • NTILE() :用分頁來理解最簡單,同時也能將所有查詢結果分最接近平分的分數,可以作為區分四分位數的方法之一。

聚合視窗函數應用

章節-Aggregate window functions and frames

結合另一段課程章節裡的 聚合函數 可以讓視窗函數有更多的篩選能力
目前新學的函數為結合上一段 Fretching的技巧,建立『Frames』在選定的框架中
可以更靈活的與表格的各區塊比對

視窗函數之外更多的功能

章節-Beyond window functions

本篇有超多不合理的解釋跟錯誤 以回覆如下

In the course discribtion says: "Pivot it by Year to get the following reshaped, cleaner table."

Gender20082012
MenAUSFRA
WomenRUSUSA

And I enter the code below (I know is not correct answer)

AS ct (Gender VARCHAR, Year VARCHAR, Country VARCHAR)

to get

GenderYearCountry
MenAUSFRA
WomenRUSUSA

Submit this answer can still get throug the course. I think is not gonna be one of the answer, right?

我寫了以下內容回覆

In the course of  Beyond window functions : Pivoting and Pivoting with ranking, checker work wired.

In Pivoting course description says:
"Pivot it by Year to get the following reshaped, cleaner table."
| Gender | 2008 | 2012 |
|--------|------|------|
| Men    | AUS  | FRA  |
| Women  | RUS  | USA  |

And I enter the code below
(I know is not correct answer)
```SQL
AS ct (Gender VARCHAR,
       Year VARCHAR,
       Country VARCHAR)
```
to get
| Gender | Year | Country |
|--------|------|---------|
| Men    | AUS  | FRA     |
| Women  | RUS  | USA     |

Submit this answer can still get through the course.
I think is not gonna be one of the answer, right?

And more in Pivoting with ranking

Says "You'll need to count the gold medals each country has earned, produce the ranks of each country by medals earned, then pivot the table to this shape."

So, in the query I use COUNT(medal) to collect how many medals each country earned.

But the answer checker says I need to use COUNT(*) means count every columns?

Though, filter work well and result will be the same, but why we use COUNT(*) instead of COUNT(medal).

And because of first question makes me not really trust the hint provide by checker.

Is there anyone can help me to lean SQL properly? 

題外補充

晚上回家看書,多得到一個新的知識 :: INTEGER 是指將後續得到的值,轉為整數形態的意思 但奇怪的部分是這個功能是突然跑出來的

Pivoting

示範完整語法如下

Spinner
DataFrameas
df
variable
-- 這裡是怪異的 Pivoting 轉置功能註解
-- 下方是轉置的啟動語句,意思是說如果沒有轉置的功能要利用EXTENSION的功能
CREATE EXTENSION IF NOT EXISTS tablefunc;

-- 將要轉換的查詢利用 CROSSTAB 並用($$   $$)來包住內容,而這部分應該是指啟用 EXTENSION中的CROSSTAB
SELECT * FROM CROSSTAB($$
-- 這裡選用CTE先建立一個查詢,以便後續欄位與值更好篩選                       
  WITH Country_Awards AS (
    SELECT
      Country,
      Year,
      COUNT(*) AS Awards
    FROM Summer_Medals
    WHERE
      Country IN ('FRA', 'GBR', 'GER')
      AND Year IN (2004, 2008, 2012)
      AND Medal = 'Gold'
    GROUP BY Country, Year)
-- 進入主查詢
  SELECT
    Country,
    Year,
    RANK() OVER
      (PARTITION BY Year
       ORDER BY Awards DESC) :: INTEGER AS rank
  FROM Country_Awards
  ORDER BY Country ASC, Year ASC;
-- 這裡以上都是包含在 $$)中被轉換的主要內容
-- Fill in the correct column names for the pivoted table
$$) AS ct 

/*
是否綁定一定要AS為ct不確定,但下方是呈現需要轉置的欄位將呈現什麼樣子
上方查詢得到的資料是以國家、年度、排名的一張表格如下一個區塊
*/

(Country VARCHAR,
"2004" INTEGER,
"2008" INTEGER,
"2012" INTEGER)

Order by Country ASC;

如果想看到這些國家在相同的時間裡各自的排名
就需要轉置,將年份作為欄位名稱、排名作為值

最終得到查詢結果如下

country 2004 2008 2012 FRA 2 3 3 GBR 3 2 1 GER 1 1 2
繼續上課新章節,介紹了GROUP BY 子句的新功能 ROLLUP()以及 CUBE()都是緊接在ODER BY之後  
兩種子句都將產生NULL值,因為都會新增一個小計後的數值  
ROLLUP():是用來將指定的欄位收合,
Spinner
DataFrameas
df
variable
-- 版本一,可以獲得每個國家的獎牌種類與數量
-- Count the medals per gender and medal type
SELECT
  country,
  gender,
  medal AS Medal_type,
  COUNT(*) AS Awards
FROM medals.Summer_Medals
WHERE
  Year = 2012
  AND Country = 'RUS'
-- Get all possible group-level subtotals
GROUP BY country, Gender, medal
ORDER BY Gender ASC, Medal ASC;

如果加入了ROLLUP會變成以下這樣