PostgreSQL的row_number() over() 用法

原創

2020-06-09 08:07

語法

row_number() over( [partition by col1] order by col2[desc])

row_number() 爲返回的記錄定義各行編號
pritition by 分組
order by 排序

我們都知道distinct 可以去重，但我今天重點講使用row_number()函數去重

測試：

我們的目的是刪除gg表中重複的數據（重複的數據保留一條原有數據，多餘數據刪除）

首先我們按照name和href分組,按ctid排序（ctid：表示數據記錄的物理行當信息，指的是一條記錄位於哪個數據塊的哪個位移上面。跟oracle中僞列 rowid 的意義一樣的；只是形式不一樣。，詳情請查看：https://www.cnblogs.com/lottu/p/5613098.html）

然後，運行如下 SQL 語句，

select row_number() over(partition by name,href order by ctid) as rn,ctid from gg；

得到的結果如下所示：

rn爲該條數據重複的次數；

接下來我們要查詢出重複的數據，執行下面語句：

select ctid from (select row_number() over(partition by name,href order by ctid) as rn,ctid from gg )as t where t.rn<>1；

得到的結果如下所示：

得到的爲重複數據的ctid；

最後就是刪除重複數據，完整的sql語句爲：

delete from gg where ctid in ( select ctid from (select row_number() over(partition by name,href order by ctid) as rn,ctid from gg )as t where t.rn<>1)；

執行完，再次執行：

select ctid from (select row_number() over(partition by name,href order by ctid) as rn,ctid from gg )as t where t.rn<>1；

你會發現如下情況：

說明已達到了目的；

參考鏈接：https://blog.csdn.net/qq_35246620/article/details/56290903該鏈接講述了distinct 和 row_number() over() 的區別

思考：

去重速度問題？我試過針對數據量多時使用 row_number()函數去重速度快，數據量越大越明顯，大家可以使用下面的例子去試一下：

gg和gg_copy是二個相同的表，使用下列去重SQL語句：

delete from gg where ctid in (select min(ctid) from gg group by name,href having count(href) >1 order by ggstart_time desc);

以及使用row_number()函數去執行：

delete from gg_copy where ctid in (select ctid from (select row_number() over(partition by (name,href) order by ctid) as rn,ctid from gg_copy ) as t where t.rn<>1);

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

PostgreSQL的row_number() over() 用法

語法

我們都知道distinct 可以去重，但我今天重點講使用row_number()函數去重

select row_number() over(partition by name,href order by ctid) as rn,ctid from gg；

select ctid from (select row_number() over(partition by name,href order by ctid) as rn,ctid from gg )as t where t.rn<>1；

delete from gg where ctid in ( select ctid from (select row_number() over(partition by name,href order by ctid) as rn,ctid from gg )as t where t.rn<>1)；

select ctid from (select row_number() over(partition by name,href order by ctid) as rn,ctid from gg )as t where t.rn<>1；

參考鏈接：https://blog.csdn.net/qq_35246620/article/details/56290903該鏈接講述了distinct 和 row_number() over() 的區別

思考：

delete from gg where ctid in (select min(ctid) from gg group by name,href having count(href) >1 order by ggstart_time desc);

delete from gg_copy where ctid in (select ctid from (select row_number() over(partition by (name,href) order by ctid) as rn,ctid from gg_copy ) as t where t.rn<>1);

通過f-string編寫簡潔高效的Python格式化輸出代碼

工作中用到的腳本合集

微服務實踐Aspire項目發佈到遠程k8s集羣

[轉帖]20個常用的Linux工具命令

[轉帖]PostgreSQL從小白到高手教程 - 第46講：poc-tpch測試

24-5-18 X

關於PDF文本的解析與PDF圖片的提取

Centos7 上安裝airflow以及postgres數據庫作爲airflow的指定數據庫

多種的PostgreSQL安裝和解析

postgre設置遠程連接

PostgreSQL的數據類型

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結