os: centos 7.4
db: postgresql 10.11
Merge Join (排序合併連接)如果行源已經被排過序,在執行排序合併連接時不需要再排序了,這時排序合併連接的性能會優於散列連接。
最主要一點是數據已經排序了。
Merge Join (排序合併連接)需要首先對兩個表按照關聯的字段進行排序,分別從兩個表中取出一行數據進行匹配,
如果合適放入結果集;不匹配將較小的那行丟掉繼續匹配另一個表的下一行,依次處理直到將兩表的數據取完。
Merge Join (排序合併連接)的很大一部分開銷花在排序上,也是同等條件下差於 Hash Join (散列連接) 的一個主要原因。
版本
# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
#
# su - postgres
$
$ psql -c "select version();"
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 10.11 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
create table
$ psql
psql (10.11)
Type "help" for help.
postgres=#
postgres=# drop table if exists tmp_t4;
drop table if exists tmp_t5;
postgres=# create table tmp_t4(
id int8 primary key,
name varchar(100)
);
create table tmp_t5(
id int8 primary key,
name varchar(100)
);
postgres=# insert into tmp_t4
select id,
md5(id::varchar)
from generate_series(1,1000000) as id;
insert into tmp_t5
select id,
md5(id::varchar)
from generate_series(1,1000000) as id;
postgres=# vacuum analyze tmp_t4;
vacuum analyze tmp_t5;
Merge Join
postgres=# set max_parallel_workers_per_gather=0;
postgres=# explain analyze
select t4.*,t5.*
from tmp_t4 t4,
tmp_t5 t5
where 1=1
and t4.id = t5.id
and t4.id <= 99999
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.49..42797.46 rows=98623 width=82) (actual time=0.039..223.628 rows=99999 loops=1)
Merge Cond: (t4.id = t5.id)
-> Index Scan using tmp_t4_pkey on tmp_t4 t4 (cost=0.42..3735.33 rows=98623 width=41) (actual time=0.025..49.473 rows=99999 loops=1)
Index Cond: (id <= 99999)
-> Index Scan using tmp_t5_pkey on tmp_t5 t5 (cost=0.42..35329.43 rows=1000000 width=41) (actual time=0.011..45.703 rows=99999 loops=1)
Planning time: 0.413 ms
Execution time: 241.561 ms
(7 rows)
postgres=# explain analyze
select t4.*,t5.*
from tmp_t4 t4,
tmp_t5 t5
where 1=1
and t4.id = t5.id
and t4.id <= 99999
order by t4.id
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.49..42797.46 rows=98623 width=82) (actual time=0.052..226.876 rows=99999 loops=1)
Merge Cond: (t4.id = t5.id)
-> Index Scan using tmp_t4_pkey on tmp_t4 t4 (cost=0.42..3735.33 rows=98623 width=41) (actual time=0.033..51.655 rows=99999 loops=1)
Index Cond: (id <= 99999)
-> Index Scan using tmp_t5_pkey on tmp_t5 t5 (cost=0.42..35329.43 rows=1000000 width=41) (actual time=0.015..46.054 rows=99999 loops=1)
Planning time: 0.376 ms
Execution time: 244.241 ms
(7 rows)
tmp_t4,tmp_t5 表的id都是主鍵,是存在順序的。
參考: