指的是兩個表連接時, 先利用兩表中記錄較少的表在內存中建立 hash 表, 然後掃描記錄較多的表並探測 hash 表, 找出與 hash 表相匹配的行來得到結果集的表連接方法. 哈希連接只能用於等值連接條件(=)。
假設下面的 sql 語句中表 T1 和 T2 的連接方式是哈希連接, T1 是驅動表
- select *
- from T1, T2
- where T1.id = T2.id and T1.name = 'David';
select *
from T1, T2
where T1.id = T2.id and T1.name = 'David';
oracle 執行步驟如下:
1 計算 hash partition 的數量 (分區數量)
這個數字由 hash_area_size, db_block_size, _hash_multiblock_io_count 的值來決定hash partition 是一個邏輯上的概念, 它由多個 hash bucket 組成, 而一個 hash table 又由多個 hash partition 組成. hash partition 是 I/O 單位, 當 hash table 過大時, 以 hash partition 爲單位寫出到磁盤; hash bucket 是 hash 運算映射的單位, 可以把 hash bucket 想象爲一個鏈表.
2 構建驅動結果集 S 的 hash table
2.1 遍歷驅動結果集, 計算 hash 值
根據謂詞條件(T1.name = 'David') 過濾驅動表 T1 的數據, 得到驅動結果集 S. 讀取 S 中的每一條數據, 並根據連接列(T1.id)做 hash 運算.
oracle 採用兩種 hash 算法進行計算, S 中的每一條記錄都會得到兩個哈希值記爲 hash_value_1, hash_value_2.2.2 存儲數據到 hash partition
oracle 按照 hash_value_1 的值把驅動結果集 S 的記錄映射存儲在不同的 hash partition 中不同 hash bucket 裏, 存儲在 hash bucket 中的內容包括 sql 中的查詢列, 連接列以及 hash_value_2 的值.
我們把驅動結果集 S 所對應的每一個 hash partition 記爲 S[i].2.3 構建位圖
這個位圖用來標記 S[i] 所包含的每一個 hash bucket 是否有記錄2.4 如果驅動結果集 S 數據量很大, 則將數據交換到磁盤上(temp 表空間)
如果驅動結果集 S 的數據量很大, 構建 S 對應的 hash table 時就會造成 PGA中的 hash_area_szie 被填滿, 這時候 oracle 會把 hash area 中記錄數最多的 hash partition 寫到磁盤上. 重複步驟 2.1 - 2.4 直至讀取數據完畢.
另外, 在構建 S 對應的 hash table 時, 如果記錄對應的 hash partition 已經被寫到磁盤上, oracle 就會將 sql 中的查詢列, 連接列以及hash_value_2 的值寫到已經位於磁盤上的 hash partition 中不同 hash bucket 裏.2.5 排序
對驅動結果集 S 的 hash partition 根據記錄數多少進行排序3 遍歷被驅動結果集 B
3.1 遍歷驅動結果集 B 及位圖過濾
把被驅動結果集 (T2) 記爲 B, 讀取 B 中的每一條記錄, 並按照連接列(T2.id)做 hash 運算, 同步驟 (2) 一樣得到兩個哈希值 hash_value_1, hash_value_2. oracle 根據這個 hash_value_1 去 S[i] 匹配 hash bucket,
- 如果能夠找到匹配的 bucket, 則進一步比較連接列是否相等, 如果相等, 則將記錄 join 後返回; 如果不相等, 則捨棄;
- 如果找不到匹配的 bucket, 就會去訪問 2.3 中構建的位圖,
【這個位圖決定是否將 hash_value_1 所對應 B 中的記錄寫回到磁盤的動作就是所謂的位圖過濾】
我們將 B 所對應的每一個 hash partition 記爲 B[j]。遍歷完 B 中的所有記錄, 構建 B[j] 完畢.
3.2 再次構建 hash table
現在 oracle 已經處理完成內存中的 S[i] 和 B[j], 只剩下磁盤上的 S[i] 和 B[j] 還未處理.由於構建 S[i] 和 B[j] 使用的相同的 hash 函數, 只有對應 hash partition number 相同的 S[i] 和 B[j] 纔有可能滿足連接條件, 所以處理磁盤上的 S[i] 和 B[j] 只需處理 hash partition number 相同的 S[i] 和 B[j].
【對於每一對相同 hash partition number 的 S[i] 和 B[j], oracle 會選擇記錄數較少的當作驅動結果集, 所以每一對相同 hash partition number 的 S[i] 和 B[j] 的驅動結果集都可能發生變化, 這就是動態角色互換】
處理完每一對相同 hash partition number 的 S[i] 和 B[j] 後, 哈希連接處理完成.
二. hash 連接特性
1. hash 連接只能用在等值連接條件
2. 驅動表的選擇對執行效率及性能有影響
3. 驅動表和被驅動表最多被訪問一次
構造測試數據
- SQL> CREATE TABLE t1 (
- 2 id NUMBER NOT NULL,
- 3 n NUMBER,
- 4 pad VARCHAR2(4000),
- 5 CONSTRAINT t1_pk PRIMARY KEY(id)
- 6 );
- Table created.
- SQL> CREATE TABLE t2 (
- 2 id NUMBER NOT NULL,
- 3 t1_id NUMBER NOT NULL,
- 4 n NUMBER,
- 5 pad VARCHAR2(4000),
- 6 CONSTRAINT t2_pk PRIMARY KEY(id),
- 7 CONSTRAINT t2_t1_fk FOREIGN KEY (t1_id) REFERENCES t1
- 8 );
- Table created.
- SQL> CREATE TABLE t3 (
- 2 id NUMBER NOT NULL,
- 3 t2_id NUMBER NOT NULL,
- 4 n NUMBER,
- 5 pad VARCHAR2(4000),
- 6 CONSTRAINT t3_pk PRIMARY KEY(id),
- 7 CONSTRAINT t3_t2_fk FOREIGN KEY (t2_id) REFERENCES t2
- 8 );
- Table created.
- SQL> CREATE TABLE t4 (
- 2 id NUMBER NOT NULL,
- 3 t3_id NUMBER NOT NULL,
- 4 n NUMBER,
- 5 pad VARCHAR2(4000),
- 6 CONSTRAINT t4_pk PRIMARY KEY(id),
- 7 CONSTRAINT t4_t3_fk FOREIGN KEY (t3_id) REFERENCES t3
- 8 );
- Table created.
- SQL> execute dbms_random.seed(0)
- PL/SQL procedure successfully completed.
- SQL> INSERT INTO t1 SELECT rownum, rownum, dbms_random.string('a',50) FROM dual CONNECT BY level <= 10 ORDER BY dbms_random.random;
- 10 rows created.
- SQL> INSERT INTO t2 SELECT 100+rownum, t1.id, 100+rownum, t1.pad FROM t1, t1 dummy ORDER BY dbms_random.random;
- 100 rows created.
- SQL> INSERT INTO t3 SELECT 1000+rownum, t2.id, 1000+rownum, t2.pad FROM t2, t1 dummy ORDER BY dbms_random.random;
- 1000 rows created.
- SQL> INSERT INTO t4 SELECT 10000+rownum, t3.id, 10000+rownum, t3.pad FROM t3, t1 dummy ORDER BY dbms_random.random;
- 10000 rows created.
- SQL> COMMIT;
- Commit complete.
SQL> CREATE TABLE t1 (
2 id NUMBER NOT NULL,
3 n NUMBER,
4 pad VARCHAR2(4000),
5 CONSTRAINT t1_pk PRIMARY KEY(id)
6 );
Table created.
SQL> CREATE TABLE t2 (
2 id NUMBER NOT NULL,
3 t1_id NUMBER NOT NULL,
4 n NUMBER,
5 pad VARCHAR2(4000),
6 CONSTRAINT t2_pk PRIMARY KEY(id),
7 CONSTRAINT t2_t1_fk FOREIGN KEY (t1_id) REFERENCES t1
8 );
Table created.
SQL> CREATE TABLE t3 (
2 id NUMBER NOT NULL,
3 t2_id NUMBER NOT NULL,
4 n NUMBER,
5 pad VARCHAR2(4000),
6 CONSTRAINT t3_pk PRIMARY KEY(id),
7 CONSTRAINT t3_t2_fk FOREIGN KEY (t2_id) REFERENCES t2
8 );
Table created.
SQL> CREATE TABLE t4 (
2 id NUMBER NOT NULL,
3 t3_id NUMBER NOT NULL,
4 n NUMBER,
5 pad VARCHAR2(4000),
6 CONSTRAINT t4_pk PRIMARY KEY(id),
7 CONSTRAINT t4_t3_fk FOREIGN KEY (t3_id) REFERENCES t3
8 );
Table created.
SQL> execute dbms_random.seed(0)
PL/SQL procedure successfully completed.
SQL> INSERT INTO t1 SELECT rownum, rownum, dbms_random.string('a',50) FROM dual CONNECT BY level <= 10 ORDER BY dbms_random.random;
10 rows created.
SQL> INSERT INTO t2 SELECT 100+rownum, t1.id, 100+rownum, t1.pad FROM t1, t1 dummy ORDER BY dbms_random.random;
100 rows created.
SQL> INSERT INTO t3 SELECT 1000+rownum, t2.id, 1000+rownum, t2.pad FROM t2, t1 dummy ORDER BY dbms_random.random;
1000 rows created.
SQL> INSERT INTO t4 SELECT 10000+rownum, t3.id, 10000+rownum, t3.pad FROM t3, t1 dummy ORDER BY dbms_random.random;
10000 rows created.
SQL> COMMIT;
Commit complete.
比較 hash 連接, nested loops 連接, sort merge join 連接
- SQL> select * from t3, t4 where t3.id = t4.t3_id;
- 10000 rows selected.
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 1396201636
- ---------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- ---------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 10000 | 1250K| 35 (3)| 00:00:01 |
- |* 1 | HASH JOIN | | 10000 | 1250K| 35 (3)| 00:00:01 |
- | 2 | TABLE ACCESS FULL| T3 | 1000 | 63000 | 5 (0)| 00:00:01 |
- | 3 | TABLE ACCESS FULL| T4 | 10000 | 634K| 29 (0)| 00:00:01 |
- ---------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - access("T3"."ID"="T4"."T3_ID")
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 779 consistent gets
- 0 physical reads
- 0 redo size
- 1376470 bytes sent via SQL*Net to client
- 7745 bytes received via SQL*Net from client
- 668 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 10000 rows processed
- SQL> select /*+ leading(t3) use_nl(t4) */* from t3, t4 where t3.id = t4.t3_id;
- 10000 rows selected.
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 2039660043
- -----------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
- -----------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 10000 | 1250K| 11007 (1)| 00:02:13 |
- | 1 | NESTED LOOPS | | | | | |
- | 2 | NESTED LOOPS | | 10000 | 1250K| 11007 (1)| 00:02:13 |
- | 3 | TABLE ACCESS FULL | T3 | 1000 | 63000 | 5 (0)| 00:00:01 |
- |* 4 | INDEX RANGE SCAN | T4_T3_ID | 10 | | 1 (0)| 00:00:01 |
- | 5 | TABLE ACCESS BY INDEX ROWID| T4 | 10 | 650 | 11 (0)| 00:00:01 |
- -----------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 4 - access("T3"."ID"="T4"."T3_ID")
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 12605 consistent gets
- 0 physical reads
- 0 redo size
- 342258 bytes sent via SQL*Net to client
- 7745 bytes received via SQL*Net from client
- 668 SQL*Net roundtrips to/from client
- 0 sorts (memory)
- 0 sorts (disk)
- 10000 rows processed
- SQL> select /*+ leading(t3) use_merge(t4) */* from t3, t4 where t3.id = t4.t3_id;
- 10000 rows selected.
- Execution Plan
- ----------------------------------------------------------
- Plan hash value: 3831111046
- ------------------------------------------------------------------------------------
- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
- ------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 10000 | 1250K| | 193 (2)| 00:00:03 |
- | 1 | MERGE JOIN | | 10000 | 1250K| | 193 (2)| 00:00:03 |
- | 2 | SORT JOIN | | 1000 | 63000 | | 6 (17)| 00:00:01 |
- | 3 | TABLE ACCESS FULL| T3 | 1000 | 63000 | | 5 (0)| 00:00:01 |
- |* 4 | SORT JOIN | | 10000 | 634K| 1592K| 187 (1)| 00:00:03 |
- | 5 | TABLE ACCESS FULL| T4 | 10000 | 634K| | 29 (0)| 00:00:01 |
- ------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 4 - access("T3"."ID"="T4"."T3_ID")
- filter("T3"."ID"="T4"."T3_ID")
- Statistics
- ----------------------------------------------------------
- 0 recursive calls
- 0 db block gets
- 119 consistent gets
- 0 physical reads
- 0 redo size
- 344114 bytes sent via SQL*Net to client
- 7745 bytes received via SQL*Net from client
- 668 SQL*Net roundtrips to/from client
- 2 sorts (memory)
- 0 sorts (disk)
- 10000 rows processed
SQL> select * from t3, t4 where t3.id = t4.t3_id;
10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1396201636
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 1250K| 35 (3)| 00:00:01 |
|* 1 | HASH JOIN | | 10000 | 1250K| 35 (3)| 00:00:01 |
| 2 | TABLE ACCESS FULL| T3 | 1000 | 63000 | 5 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T4 | 10000 | 634K| 29 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T3"."ID"="T4"."T3_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
779 consistent gets
0 physical reads
0 redo size
1376470 bytes sent via SQL*Net to client
7745 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
SQL> select /*+ leading(t3) use_nl(t4) */* from t3, t4 where t3.id = t4.t3_id;
10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2039660043
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 1250K| 11007 (1)| 00:02:13 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10000 | 1250K| 11007 (1)| 00:02:13 |
| 3 | TABLE ACCESS FULL | T3 | 1000 | 63000 | 5 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T4_T3_ID | 10 | | 1 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| T4 | 10 | 650 | 11 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."ID"="T4"."T3_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
12605 consistent gets
0 physical reads
0 redo size
342258 bytes sent via SQL*Net to client
7745 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10000 rows processed
SQL> select /*+ leading(t3) use_merge(t4) */* from t3, t4 where t3.id = t4.t3_id;
10000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3831111046
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 1250K| | 193 (2)| 00:00:03 |
| 1 | MERGE JOIN | | 10000 | 1250K| | 193 (2)| 00:00:03 |
| 2 | SORT JOIN | | 1000 | 63000 | | 6 (17)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T3 | 1000 | 63000 | | 5 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 10000 | 634K| 1592K| 187 (1)| 00:00:03 |
| 5 | TABLE ACCESS FULL| T4 | 10000 | 634K| | 29 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."ID"="T4"."T3_ID")
filter("T3"."ID"="T4"."T3_ID")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
119 consistent gets
0 physical reads
0 redo size
344114 bytes sent via SQL*Net to client
7745 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
10000 rows processed
從上面的執行計劃可以看出:
排序次數 | 邏輯讀 | CPU Time | |
hash join | 0 | 779 | 00:01 |
nested loops | 0 | 12605 | 02:13 |
merge join | 2 | 119 | 00:03 |
可見,oracle 引入的 hash 連接, 能夠解決嵌套循環連接中大量隨機讀的問題, 同時解決了排序合併連接中排序代價過大的問題.
- SQL> alter session set statistics_level=ALL;
- SQL> select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
- 10 rows selected.
- SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
- PLAN_TABLE_OUTPUT
- --------------------------------------------------------------------------------------------------------------------------------
- --------------------------------------------------------------------------------------------------------------------------------
- SQL_ID f57pu4khtptsc, child number 0
- -------------------------------------
- select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id =
- t4.t3_id and t3.n = 1100
- Plan hash value: 1396201636
- ----------------------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
- ----------------------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.03 | 120 | | | |
- |* 1 | HASH JOIN | | 1 | 10 | 10 |00:00:00.03 | 120 | 737K| 737K| 389K (0)|
- |* 2 | TABLE ACCESS FULL| T3 | 1 | 1 | 1 |00:00:00.01 | 15 | | | |
- | 3 | TABLE ACCESS FULL| T4 | 1 | 10000 | 10000 |00:00:00.01 | 105 | | | |
- ----------------------------------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - access("T3"."ID"="T4"."T3_ID")
- 2 - filter("T3"."N"=1100)
SQL> alter session set statistics_level=ALL;
SQL> select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
10 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------
SQL_ID f57pu4khtptsc, child number 0
-------------------------------------
select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id =
t4.t3_id and t3.n = 1100
Plan hash value: 1396201636
----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.03 | 120 | | | |
|* 1 | HASH JOIN | | 1 | 10 | 10 |00:00:00.03 | 120 | 737K| 737K| 389K (0)|
|* 2 | TABLE ACCESS FULL| T3 | 1 | 1 | 1 |00:00:00.01 | 15 | | | |
| 3 | TABLE ACCESS FULL| T4 | 1 | 10000 | 10000 |00:00:00.01 | 105 | | | |
----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T3"."ID"="T4"."T3_ID")
2 - filter("T3"."N"=1100)
在表 T3 的謂詞條件(n)上增加索引- SQL> create index t3_n on t3(n);
- Index created.
- SQL> select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
- 10 rows selected.
- SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
- PLAN_TABLE_OUTPUT
- --------------------------------------------------------------------------------------------------------------------------------------
- --------------------------------------------------------------------------------------------------------------------------------------
- SQL_ID f57pu4khtptsc, child number 0
- -------------------------------------
- select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id =
- t4.t3_id and t3.n = 1100
- Plan hash value: 2452410886
- --------------------------------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
- --------------------------------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.03 | 108 | | | |
- |* 1 | HASH JOIN | | 1 | 10 | 10 |00:00:00.03 | 108 | 737K| 737K| 389K (0)|
- | 2 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
- |* 3 | INDEX RANGE SCAN | T3_N | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
- | 4 | TABLE ACCESS FULL | T4 | 1 | 10000 | 10000 |00:00:00.01 | 105 | | | |
- --------------------------------------------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 1 - access("T3"."ID"="T4"."T3_ID")
- 3 - access("T3"."N"=1100)
SQL> create index t3_n on t3(n);
Index created.
SQL> select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
10 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f57pu4khtptsc, child number 0
-------------------------------------
select /*+ leading(t3) use_hash(t4) */* from t3, t4 where t3.id =
t4.t3_id and t3.n = 1100
Plan hash value: 2452410886
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.03 | 108 | | | |
|* 1 | HASH JOIN | | 1 | 10 | 10 |00:00:00.03 | 108 | 737K| 737K| 389K (0)|
| 2 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
|* 3 | INDEX RANGE SCAN | T3_N | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
| 4 | TABLE ACCESS FULL | T4 | 1 | 10000 | 10000 |00:00:00.01 | 105 | | | |
--------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T3"."ID"="T4"."T3_ID")
3 - access("T3"."N"=1100)
從執行計劃中可以看出 buffers 從 120 下降爲 108, 可見謂詞條件上的索引能夠減少 hash 連接的邏輯讀接下來,看看在等值連接條件下,小表(小的結果集)爲驅動表,hash 連接和 nested loop 嵌套循環連接
- SQL> select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
- SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
- PLAN_TABLE_OUTPUT
- ---------------------------------------------------------------------------------------------------------------------
- ---------------------------------------------------------------------------------------------------------------------
- SQL_ID c204pd6srpjfq, child number 0
- -------------------------------------
- select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
- Plan hash value: 2039660043
- ---------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
- ---------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 29 |
- | 1 | NESTED LOOPS | | 1 | | 10 |00:00:00.01 | 29 |
- | 2 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 19 |
- |* 3 | TABLE ACCESS FULL | T3 | 1 | 1 | 1 |00:00:00.01 | 16 |
- |* 4 | INDEX RANGE SCAN | T4_T3_ID | 1 | 10 | 10 |00:00:00.01 | 3 |
- | 5 | TABLE ACCESS BY INDEX ROWID| T4 | 10 | 10 | 10 |00:00:00.01 | 10 |
- ---------------------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 3 - filter("T3"."N"=1100)
- 4 - access("T3"."ID"="T4"."T3_ID")
- SQL> select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
- SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
- PLAN_TABLE_OUTPUT
- -------------------------------------------------------------------------------------------------------------------------------------
- -------------------------------------------------------------------------------------------------------------------------------------
- SQL_ID c204pd6srpjfq, child number 0
- -------------------------------------
- select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
- Plan hash value: 2304842513
- -------------------------------------------------------------------------------------------------------------
- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
- -------------------------------------------------------------------------------------------------------------
- | 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 17 | 1 |
- | 1 | NESTED LOOPS | | 1 | | 10 |00:00:00.01 | 17 | 1 |
- | 2 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 7 | 1 |
- | 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 1 | 1 |00:00:00.01 | 4 | 1 |
- |* 4 | INDEX RANGE SCAN | T3_N | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
- |* 5 | INDEX RANGE SCAN | T4_T3_ID | 1 | 10 | 10 |00:00:00.01 | 3 | 0 |
- | 6 | TABLE ACCESS BY INDEX ROWID | T4 | 10 | 10 | 10 |00:00:00.01 | 10 | 0 |
- -------------------------------------------------------------------------------------------------------------
- Predicate Information (identified by operation id):
- ---------------------------------------------------
- 4 - access("T3"."N"=1100)
- 5 - access("T3"."ID"="T4"."T3_ID")
SQL> select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
SQL_ID c204pd6srpjfq, child number 0
-------------------------------------
select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
Plan hash value: 2039660043
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 29 |
| 1 | NESTED LOOPS | | 1 | | 10 |00:00:00.01 | 29 |
| 2 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 19 |
|* 3 | TABLE ACCESS FULL | T3 | 1 | 1 | 1 |00:00:00.01 | 16 |
|* 4 | INDEX RANGE SCAN | T4_T3_ID | 1 | 10 | 10 |00:00:00.01 | 3 |
| 5 | TABLE ACCESS BY INDEX ROWID| T4 | 10 | 10 | 10 |00:00:00.01 | 10 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("T3"."N"=1100)
4 - access("T3"."ID"="T4"."T3_ID")
SQL> select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------
SQL_ID c204pd6srpjfq, child number 0
-------------------------------------
select * from t3, t4 where t3.id = t4.t3_id and t3.n = 1100
Plan hash value: 2304842513
-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 10 |00:00:00.01 | 17 | 1 |
| 1 | NESTED LOOPS | | 1 | | 10 |00:00:00.01 | 17 | 1 |
| 2 | NESTED LOOPS | | 1 | 10 | 10 |00:00:00.01 | 7 | 1 |
| 3 | TABLE ACCESS BY INDEX ROWID| T3 | 1 | 1 | 1 |00:00:00.01 | 4 | 1 |
|* 4 | INDEX RANGE SCAN | T3_N | 1 | 1 | 1 |00:00:00.01 | 3 | 1 |
|* 5 | INDEX RANGE SCAN | T4_T3_ID | 1 | 10 | 10 |00:00:00.01 | 3 | 0 |
| 6 | TABLE ACCESS BY INDEX ROWID | T4 | 10 | 10 | 10 |00:00:00.01 | 10 | 0 |
-------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T3"."N"=1100)
5 - access("T3"."ID"="T4"."T3_ID")
從上面的執行計劃中可以看出, 採用 nested loops 嵌套循環連接的 CPU 0.03 降爲 0.01, buffers 從 108 降爲 17, 因此,在等值連接條件且在連接列條件上有索引, 如果返回的數據量較少, 適合用嵌套循環連接; 如果返回的數據量比較大, 則適合用 hash 連接。
四. 小結
在大多數的情況下, 哈希連接的效率比嵌套循環連接和排序合併連接更高:
1. 哈希連接可能比嵌套循環連接快,因爲處理內存中的哈希表比檢索B樹更加迅速。
2. 哈希連接可能比排序合併連接更快,因爲這種情況下,只有一張源表需要排序,而且只是對 hash partition 排序。在排序合併連接中,兩張表的數據都需要先做排序,然後做MERGE操作,因此效率相對最差。
hash 連接很適合一大一小的結果集連接返回大數據量的情形, 特別是 hash table 能夠全部放在 hash area 的情況下, 這時候哈希連接的執行時間可以近似看做是全表掃描兩個結果集的時間之和.
在 sql 調優時, 如果遇到表的連接方式是 hash 連接, 進行優化可以考慮以下幾點:
1. 確認小結果集爲驅動結果集
2. 如果有謂詞條件, 考慮在謂詞條件上增加索引
3. 確認涉及到的表和連接列被分析過, 如果連接列上的數據分佈不均勻, 考慮在此列上收集直方圖
4. 增加 hash_area_size 大小, 使哈希連接只在內存就能完成, 即保證 PGA hash area 能夠容納 hash 運算