plsql優化笛卡爾積

轉載自：http://www.itpub.net/thread-1840767-1-1.html

寫在最前，是對我自己而言收穫最大的想法。
oracle的優化，瞭解CBO很重要，分析執行計劃很重要；
但是，優化絕不止於CBO，相比之下優化必談CBO，我覺得多少有點誤區。
我們究竟是否明白一個sql或者一段plsql執行過程中哪步最費時？爲什麼？
--******************************************************
正題
論壇上有很多網友問過這樣的問題：
我的表中有一個字段text，有一個分隔符 ','，需要按照分隔符把數據分爲n行，每行按順序取夾在分隔符中的部分
比如數據是
id text
1 a, b, c, d
2 e, f, g
我需要的結果是
1 a
1 b
1 c
1 d
2 e
2 f
2 g

這時候，很多網友會用itpub上最經典的sql回覆

select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
from t1
connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;

複製代碼

當然，也可能有“正則黑”，會用substr套instr的版本來實現。
當然，也會有笛卡爾積的版本。
總之，這已經成了一個套路。

但是，這樣做真的好麼？
我以前從來沒思考過這個問題，直到我的項目中真正接觸到了這個問題，我才發現，sql的幾個寫法，幾乎只有理論上的意義。
我們來測測。

create table t1 (id int, text varchar2(4000));
insert into t1
select 1, listagg(lpad('a', 60), chr(10)) within group (order by 1)||chr(10)
from dual
connect by rownum<=64;
insert into t1
select n+1, text
from t1, (select rownum n from dual connect by rownum<=199);

複製代碼

--sql connect by 寫法
bill@ORCL> select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
2 from t1
3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
已選擇 12800 行。
已用時間: 00: 00: 16.24
執行計劃
----------------------------------------------------------
Plan hash value: 3874795171
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR "ID")
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
"DBMS_RANDOM"."VALUE"()>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
4 recursive calls
0 db block gets
313 consistent gets
0 physical reads
0 redo size
228399 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

16秒這個時間有點出乎我的意料。我沒想到會這麼慢，試試substr套instr的版本

bill@ORCL> select id, substr(text,
2 decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)),
3 instr(text, chr(10), 1, level)-decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)
4 from t1
5 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
已選擇 12800 行。
已用時間: 00: 00: 04.89
執行計劃
----------------------------------------------------------
Plan hash value: 3874795171
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR "ID")
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
"DBMS_RANDOM"."VALUE"()>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
0 recursive calls
0 db block gets
248 consistent gets
0 physical reads
0 redo size
25757310 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

強多了，看來oracle的正則最好是能不用盡量不用。
但是仍然很不滿意，這才幾條數據啊。
試試笛卡爾積的版本

bill@ORCL> select substr(text,
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
4 from t1,
5 (select rownum lv from dual connect by rownum<=64) b;
已選擇 12800 行。
已用時間: 00: 00: 03.30
執行計劃
----------------------------------------------------------
Plan hash value: 894562235
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
| 2 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 186 | 363K| 70 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 186 | 363K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
48 recursive calls
0 db block gets
435 consistent gets
2 physical reads
0 redo size
359032 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

又快一點，不過僅僅一點而已，數量級是不會變了。
試試物理表

create table t2 (id int, lv int, text varchar2(4000));
insert into t2
select (n-1)*200+id, n, text
from t1, (select rownum n from dual connect by rownum<=64);
bill@ORCL> select substr(text,
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
4 from t2;
已選擇 12800 行。
已用時間: 00: 00: 03.53
執行計劃
----------------------------------------------------------
Plan hash value: 1513984157
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15663 | 30M| 3567 (1)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T2 | 15663 | 30M| 3567 (1)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
0 recursive calls
0 db block gets
13005 consistent gets
12997 physical reads
0 redo size
3526140 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

從實驗現象來看，似乎oracle不管你select的substr需要多少空間，都把全表複製64份，再取substr。如果是這樣，那這裏的內存操作實在太低效了。
多做兩個實驗看看。

bill@ORCL> select substr(text,
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
4 from t1,
5 (
6 select id, lv
7 from (select id from t1),
8 (select rownum lv from dual connect by rownum<=64)
9 where rownum>0
10 ) b
11 where t1.id=b.id;
已選擇 12800 行。
已用時間: 00: 00: 03.56
執行計劃
----------------------------------------------------------
Plan hash value: 553474792
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 187 | 372K| 71 (0)| 00:00:01 |
|* 1 | HASH JOIN | | 187 | 372K| 71 (0)| 00:00:01 |
| 2 | VIEW | | 186 | 4836 | 3 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | FILTER | | | | | |
| 5 | MERGE JOIN CARTESIAN | | 186 | 4836 | 3 (0)| 00:00:01 |
| 6 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
| 7 | COUNT | | | | | |
|* 8 | CONNECT BY WITHOUT FILTERING| | | | | |
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 10 | BUFFER SORT | | 186 | 2418 | 3 (0)| 00:00:01 |
| 11 | INDEX FULL SCAN | SYS_C0010642 | 186 | 2418 | 1 (0)| 00:00:01 |
| 12 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="B"."ID")
4 - filter(ROWNUM>0)
8 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
15 recursive calls
0 db block gets
515 consistent gets
9 physical reads
0 redo size
25712495 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

可以看到，改寫的速度也是一樣，似乎效率還要更差一點。
再看看我們明確的告訴oracle select的字段內容，它的表現怎麼樣，

bill@ORCL> select substr(t1.id, instr(text, chr(10), 1, 58), instr(text, chr(10), 1, 59)-instr(text, chr(10), 1, 58)-1) item_txt
2 from t1,
3 (select rownum lv from dual connect by rownum<=64) b;
已選擇 12800 行。
已用時間: 00: 00: 01.26
執行計劃
----------------------------------------------------------
Plan hash value: 894562235
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
| 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
7 recursive calls
0 db block gets
380 consistent gets
2 physical reads
0 redo size
227513 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

又快了一些。
那經過這些測試，似乎可以得出這樣的結論，cpu運算和內存操作現在是需要考慮的重要因素(本來也是，只不過我們似乎總是做不了什麼來優化這些)。
很顯然的結論就是，長的字符串，我們對它進行的substr和instr操作，自然會更慢，變量的賦值，效率也會更低。
思考到這裏，似乎需要轉向plsql了。
因爲我的需求中，打散之後的結果是要和其他表繼續關聯，所以我理所當然想要用管道表函數來試試。
第一個版本的代碼並不難寫，除了plsql中這些複雜的語法，當然複雜是相對sql而言，其他高級編程語言請呵呵。

create or replace package refcur_pkg_v1
authid current_user
as
type inrec is record (
id number(38),
text varchar2(4000));
type refcur_t is ref cursor return inrec;
type outrec_typ is record (
id number(38),
item_txt varchar2(4000));
type outrecset is table of outrec_typ;
function f_cartesian (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
end;
/
create or replace PACKAGE BODY refcur_pkg_v1 IS
FUNCTION f_cartesian (p refcur_t) RETURN outrecset PIPELINED
parallel_enable (partition p by any)
IS
in_rec p%ROWTYPE;
out_rec outrec_typ;
C_NL_TERM varchar2(2) := chr(10); --unix style
v_nl_pos int:=0;
v_tmp_pos int:=0;
BEGIN
LOOP
FETCH p INTO in_rec; -- input row
EXIT WHEN p%NOTFOUND;
v_nl_pos :=0;
v_tmp_pos :=0;
out_rec.id :=in_rec.id;
FOR i IN 1..100000 LOOP
v_tmp_pos:=instr(in_rec.text, C_NL_TERM, v_nl_pos+1);
exit when v_tmp_pos=0;
out_rec.item_txt :=substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1);
v_nl_pos :=v_tmp_pos;
PIPE ROW(out_rec);
END LOOP;
END LOOP;
CLOSE p;
RETURN;
END f_cartesian;
END refcur_pkg_v1;
/

複製代碼

bill@ORCL> select id, item_txt
2 from table(refcur_pkg_v1.f_cartesian(cursor(
3 select id, text
4 from t1
5 )));
已選擇 12800 行。
已用時間: 00: 00: 00.70
執行計劃
----------------------------------------------------------
Plan hash value: 4049074522
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
217 recursive calls
0 db block gets
264 consistent gets
0 physical reads
0 redo size
228399 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

0.7秒!看來方向是正確的。
但是我的需求中數據量很龐大，最好還能繼續優化一下。
現在的版本是帶着整個text的內容去循環，那很自然的會想用二分法去試一試。

create or replace package refcur_pkg
authid current_user
as
type inrec is record (
id number(38),
lines number(38),
text varchar2(4000));
type refcur_t is ref cursor return inrec;
type outrec_typ is record (
id number(38),
item_txt varchar2(4000));
type outrecset is table of outrec_typ;
function f_cartesian (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
function f_cartesian2 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
function f_cartesian3 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any);
end;
/
create or replace package body refcur_pkg IS
function f_cartesian (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any)
is
in_rec p%ROWTYPE;
out_rec outrec_typ;
C_NL_TERM varchar2(2) := chr(10); --unix style
C_NL_LENG int := 1; --unix new line length
C_SIZE int:=64; --split clob to varchar by every C_SIZE-th newline character
C_MAX_LEN INT:=1024; --max length for item_txt; if change, also change item_txt varchar2(1024)
v_div_pos int; --end postion of each sub clob
v1_div_pos int; --end postion of each sub clob
v2_div_pos int; --end postion of each sub clob
v3_div_pos int; --end postion of each sub clob
v4_div_pos int; --end postion of each sub clob
v5_div_pos int; --end postion of each sub clob
v1_substr varchar2(4000);
v2_substr varchar2(4000);
v3_substr varchar2(4000);
v4_substr varchar2(4000);
v5_substr varchar2(4000);
begin
loop
fetch p into in_rec; -- input row
exit when p%NOTFOUND;
out_rec.id :=in_rec.id;
--if lines=C_SIZE, then dichotomy
if in_rec.lines=C_SIZE then
v_div_pos :=instr(in_rec.text, C_NL_TERM, 1, C_SIZE/2);
for a in 1..2 loop
v1_substr :=substr(in_rec.text, 1+v_div_pos*(a-1), v_div_pos*(2-a)+1e6*(a-1));
v1_div_pos :=instr(v1_substr, C_NL_TERM, 1, C_SIZE/4);
for b in 1..2 loop
v2_substr :=substr(v1_substr, 1+v1_div_pos*(b-1), v1_div_pos*(2-b)+1e6*(b-1));
v2_div_pos :=instr(v2_substr, C_NL_TERM, 1, C_SIZE/8);
for c in 1..2 loop
v3_substr :=substr(v2_substr, 1+v2_div_pos*(c-1), v2_div_pos*(2-c)+1e6*(c-1));
v3_div_pos :=instr(v3_substr, C_NL_TERM, 1, C_SIZE/16);
for d in 1..2 loop
v4_substr :=substr(v3_substr, 1+v3_div_pos*(d-1), v3_div_pos*(2-d)+1e6*(d-1));
v4_div_pos :=instr(v4_substr, C_NL_TERM, 1, C_SIZE/32);
for e in 1..2 loop
v5_substr :=substr(v4_substr, 1+v4_div_pos*(e-1), v4_div_pos*(2-e)+1e6*(e-1));
v5_div_pos :=instr(v5_substr, C_NL_TERM, 1, 1);
for f in 1..2 loop
out_rec.item_txt :=substr(v5_substr, 1+v5_div_pos*(f-1), (v5_div_pos-1)*(2-f)+(v4_div_pos-v5_div_pos-1)*(f-1));
exit when out_rec.item_txt is null;
pipe row(out_rec);
end loop;
end loop;
end loop;
end loop;
end loop;
end loop;
--if lines=C_SIZE, then ordinary loop method,
else
v_div_pos:=0;
v1_div_pos:=0;
for i in 1..1000000000 loop
v1_div_pos:=instr(in_rec.text, C_NL_TERM, v_div_pos+1);
exit when v1_div_pos=0;
out_rec.item_txt :=substr(in_rec.text, v_div_pos+1, v1_div_pos-v_div_pos-1);
v_div_pos :=v1_div_pos;
pipe row(out_rec);
end loop;
end if;
end loop;
close p;
return;
end f_cartesian;
function f_cartesian2 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any)
is
in_rec p%ROWTYPE;
out_rec outrec_typ;
begin
loop
fetch p into in_rec; -- input row
exit when p%NOTFOUND;
out_rec.id :=in_rec.id;
out_rec.item_txt :=in_rec.text;
for i in 1..in_rec.lines loop
pipe row(out_rec);
end loop;
end loop;
close p;
return;
end f_cartesian2;
function f_cartesian3 (p refcur_t) return outrecset pipelined
parallel_enable (partition p by any)
is
in_rec p%ROWTYPE;
out_rec outrec_typ;
begin
loop
fetch p into in_rec; -- input row
exit when p%NOTFOUND;
out_rec.id :=in_rec.id;
out_rec.item_txt :=substr(in_rec.text, 1, 64);
for i in 1..in_rec.lines loop
pipe row(out_rec);
end loop;
end loop;
close p;
return;
end f_cartesian3;
end refcur_pkg;
/

複製代碼

bill@ORCL> select id, item_txt
2 from table(refcur_pkg.f_cartesian(cursor(
3 select id, length(text)-length(replace(text, chr(10))), text
4 from t1
5 )));
已選擇 12800 行。
已用時間: 00: 00: 00.40
執行計劃
----------------------------------------------------------
Plan hash value: 4049074522
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
258 recursive calls
0 db block gets
469 consistent gets
1 physical reads
0 redo size
228399 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

我不太想詳細解釋二分法的代碼，因爲它太醜了，而代碼本身的邏輯其實很簡單。
因爲在我的處理中，是要從clob進行打散，所以我先把之前的處理結果分成64行爲一個大的varchar，到這裏再用硬性的6層二分法，
如果你的需求中行數不定，可以略微更改一下if條件，以及二分法的層數，這樣可以不必在二分法的層數上太過傷神。
總之，我們看到了性能的提升，是顯著的，而且，在我的測試當中，我發現先把instr的變量存起來，
也就是類似substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1)這種寫法，要更高效，上面的代碼其實還可以再改。

純粹爲了測試，我還做了f_cartesian2和f_cartesian3，
前者是簡單的實現數據的複製，後者是每行取指定的substr，在這兩種情況下，其實plsql就沒有什麼優勢了。

--sql connect by
bill@ORCL> select id, text
2 from t1
3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
已選擇 12800 行。
已用時間: 00: 00: 06.81
執行計劃
----------------------------------------------------------
Plan hash value: 3874795171
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ID"=PRIOR "ID")
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
"DBMS_RANDOM"."VALUE"()>0)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
4 recursive calls
0 db block gets
313 consistent gets
0 physical reads
0 redo size
236099 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
12800 rows processed
--sql笛卡爾
bill@ORCL> select id, text
2 from t1,
3 (select rownum n from dual connect by rownum<=64) b;
已選擇 12800 行。
已用時間: 00: 00: 05.00
執行計劃
----------------------------------------------------------
Plan hash value: 894562235
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
| 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
| 3 | COUNT | | | | | |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(ROWNUM<=64)
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
7 recursive calls
0 db block gets
378 consistent gets
0 physical reads
0 redo size
280133 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
12800 rows processed
--管道表函數
bill@ORCL> select id, item_txt
2 from table(refcur_pkg.f_cartesian2(cursor(
3 select id, length(text)-length(replace(text, chr(10))), text
4 from t1
5 )));
已選擇 12800 行。
已用時間: 00: 00: 05.16
執行計劃
----------------------------------------------------------
Plan hash value: 2991304848
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN2 | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
266 recursive calls
0 db block gets
529 consistent gets
0 physical reads
0 redo size
236103 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed
bill@ORCL> select id, item_txt
2 from table(refcur_pkg.f_cartesian3(cursor(
3 select id, length(text)-length(replace(text, chr(10))), text
4 from t1
5 )));
已選擇 12800 行。
已用時間: 00: 00: 00.29
執行計劃
----------------------------------------------------------
Plan hash value: 2831580648
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN3 | 8168 | | 29 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
統計信息
----------------------------------------------------------
278 recursive calls
0 db block gets
555 consistent gets
0 physical reads
124 redo size
228407 bytes sent via SQL*Net to client
9927 bytes received via SQL*Net from client
855 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
12800 rows processed

複製代碼

整個這些測試，比較，對我震撼是比較大的。用sql來實現笛卡爾然後取不同的數據，似乎是個天經地義適合sql來處理的事情，可惜到最後我都不知道這究竟是不適合sql，還是oracle沒把這個實現好？大家不妨用別的數據庫來測測？
不過，從此以後，oracle裏面，再有這種需求，大家不妨參考這裏plsql的思路來做。

plsql優化笛卡爾積

存儲過程使用遊標變量返回結果集（推薦）

【分析函數】使用分析函數LAST_VALUE或11g LAG實現缺失數據填充及其區別

plsql優化笛卡爾積

使用 RATIO_TO_REPORT() 計算百分比

ORACLE 內置函數之 GREATEST 和 LEAST(求多列的最大值,最小值)

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結