postgresql 數據庫執行計劃 Merge Join

os: centos 7.4
db: postgresql 10.11

Merge Join (排序合併連接)如果行源已經被排過序,在執行排序合併連接時不需要再排序了,這時排序合併連接的性能會優於散列連接。

最主要一點是數據已經排序了。

Merge Join (排序合併連接)需要首先對兩個表按照關聯的字段進行排序,分別從兩個表中取出一行數據進行匹配,
如果合適放入結果集;不匹配將較小的那行丟掉繼續匹配另一個表的下一行,依次處理直到將兩表的數據取完。

Merge Join (排序合併連接)的很大一部分開銷花在排序上,也是同等條件下差於 Hash Join (散列連接) 的一個主要原因。

版本

# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core) 
# 
# su - postgres
$
$ psql -c "select version();"
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 10.11 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

create table

$ psql
psql (10.11)
Type "help" for help.

postgres=# 
postgres=# drop table if exists tmp_t4;
drop table if exists tmp_t5;


postgres=# create table tmp_t4( 
	id    int8 primary key,
	name  varchar(100)
);

create table tmp_t5( 
	id    int8 primary key,
	name  varchar(100)
);

postgres=# insert into tmp_t4 
select id,
       md5(id::varchar)
  from generate_series(1,1000000) as id;

insert into tmp_t5 
select id,
       md5(id::varchar)
  from generate_series(1,1000000) as id;

postgres=# vacuum analyze tmp_t4;
vacuum analyze tmp_t5;

Merge Join

postgres=# set max_parallel_workers_per_gather=0;

postgres=# explain analyze 
select t4.*,t5.*
  from tmp_t4 t4,
       tmp_t5 t5
 where 1=1
   and t4.id = t5.id
   and t4.id <= 99999   
;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.49..42797.46 rows=98623 width=82) (actual time=0.039..223.628 rows=99999 loops=1)
   Merge Cond: (t4.id = t5.id)
   ->  Index Scan using tmp_t4_pkey on tmp_t4 t4  (cost=0.42..3735.33 rows=98623 width=41) (actual time=0.025..49.473 rows=99999 loops=1)
         Index Cond: (id <= 99999)
   ->  Index Scan using tmp_t5_pkey on tmp_t5 t5  (cost=0.42..35329.43 rows=1000000 width=41) (actual time=0.011..45.703 rows=99999 loops=1)
 Planning time: 0.413 ms
 Execution time: 241.561 ms
(7 rows)

postgres=# explain analyze 
select t4.*,t5.*
  from tmp_t4 t4,
       tmp_t5 t5
 where 1=1
   and t4.id = t5.id
   and t4.id <= 99999   
order by t4.id
;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1.49..42797.46 rows=98623 width=82) (actual time=0.052..226.876 rows=99999 loops=1)
   Merge Cond: (t4.id = t5.id)
   ->  Index Scan using tmp_t4_pkey on tmp_t4 t4  (cost=0.42..3735.33 rows=98623 width=41) (actual time=0.033..51.655 rows=99999 loops=1)
         Index Cond: (id <= 99999)
   ->  Index Scan using tmp_t5_pkey on tmp_t5 t5  (cost=0.42..35329.43 rows=1000000 width=41) (actual time=0.015..46.054 rows=99999 loops=1)
 Planning time: 0.376 ms
 Execution time: 244.241 ms
(7 rows)

tmp_t4,tmp_t5 表的id都是主鍵,是存在順序的。

參考:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章