轉載自:http://www.itpub.net/thread-1840767-1-1.html
寫在最前,是對我自己而言收穫最大的想法。
oracle的優化,瞭解CBO很重要,分析執行計劃很重要;
但是,優化絕不止於CBO,相比之下優化必談CBO,我覺得多少有點誤區。
我們究竟是否明白一個sql或者一段plsql執行過程中哪步最費時?爲什麼?
--******************************************************
正題
論壇上有很多網友問過這樣的問題:
我的表中有一個字段text,有一個分隔符 ',',需要按照分隔符把數據分爲n行,每行按順序取夾在分隔符中的部分
比如數據是
id text
1 a, b, c, d
2 e, f, g
我需要的結果是
1 a
1 b
1 c
1 d
2 e
2 f
2 g
這時候,很多網友會用itpub上最經典的sql回覆
複製代碼
當然,也可能有“正則黑”,會用substr套instr的版本來實現。
當然,也會有笛卡爾積的版本。
總之,這已經成了一個套路。
但是,這樣做真的好麼?
我以前從來沒思考過這個問題,直到我的項目中真正接觸到了這個問題,我才發現,sql的幾個寫法,幾乎只有理論上的意義。
我們來測測。
複製代碼
複製代碼
16秒這個時間有點出乎我的意料。我沒想到會這麼慢,試試substr套instr的版本
複製代碼
強多了,看來oracle的正則最好是能不用盡量不用。
但是仍然很不滿意,這才幾條數據啊。
試試笛卡爾積的版本
複製代碼
又快一點,不過僅僅一點而已,數量級是不會變了。
試試物理表
複製代碼
從實驗現象來看,似乎oracle不管你select的substr需要多少空間,都把全表複製64份,再取substr。如果是這樣,那這裏的內存操作實在太低效了。
多做兩個實驗看看。
複製代碼
可以看到,改寫的速度也是一樣,似乎效率還要更差一點。
再看看我們明確的告訴oracle select的字段內容,它的表現怎麼樣,
複製代碼
又快了一些。
那經過這些測試,似乎可以得出這樣的結論,cpu運算和內存操作現在是需要考慮的重要因素(本來也是,只不過我們似乎總是做不了什麼來優化這些)。
很顯然的結論就是,長的字符串,我們對它進行的substr和instr操作,自然會更慢,變量的賦值,效率也會更低。
思考到這裏,似乎需要轉向plsql了。
因爲我的需求中,打散之後的結果是要和其他表繼續關聯,所以我理所當然想要用管道表函數來試試。
第一個版本的代碼並不難寫,除了plsql中這些複雜的語法,當然複雜是相對sql而言,其他高級編程語言請呵呵。
複製代碼
複製代碼
0.7秒!看來方向是正確的。
但是我的需求中數據量很龐大,最好還能繼續優化一下。
現在的版本是帶着整個text的內容去循環,那很自然的會想用二分法去試一試。
複製代碼
複製代碼
我不太想詳細解釋二分法的代碼,因爲它太醜了,而代碼本身的邏輯其實很簡單。
因爲在我的處理中,是要從clob進行打散,所以我先把之前的處理結果分成64行爲一個大的varchar,到這裏再用硬性的6層二分法,
如果你的需求中行數不定,可以略微更改一下if條件,以及二分法的層數,這樣可以不必在二分法的層數上太過傷神。
總之,我們看到了性能的提升,是顯著的,而且,在我的測試當中,我發現先把instr的變量存起來,
也就是類似substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1)這種寫法,要更高效,上面的代碼其實還可以再改。
純粹爲了測試,我還做了f_cartesian2和f_cartesian3,
前者是簡單的實現數據的複製,後者是每行取指定的substr,在這兩種情況下,其實plsql就沒有什麼優勢了。
複製代碼
整個這些測試,比較,對我震撼是比較大的。用sql來實現笛卡爾然後取不同的數據,似乎是個天經地義適合sql來處理的事情,可惜到最後我都不知道這究竟是不適合sql,還是oracle沒把這個實現好?大家不妨用別的數據庫來測測?
不過,從此以後,oracle裏面,再有這種需求,大家不妨參考這裏plsql的思路來做。
oracle的優化,瞭解CBO很重要,分析執行計劃很重要;
但是,優化絕不止於CBO,相比之下優化必談CBO,我覺得多少有點誤區。
我們究竟是否明白一個sql或者一段plsql執行過程中哪步最費時?爲什麼?
--******************************************************
正題
論壇上有很多網友問過這樣的問題:
我的表中有一個字段text,有一個分隔符 ',',需要按照分隔符把數據分爲n行,每行按順序取夾在分隔符中的部分
比如數據是
id text
1 a, b, c, d
2 e, f, g
我需要的結果是
1 a
1 b
1 c
1 d
2 e
2 f
2 g
這時候,很多網友會用itpub上最經典的sql回覆
-
select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
-
from t1
- connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
當然,也會有笛卡爾積的版本。
總之,這已經成了一個套路。
但是,這樣做真的好麼?
我以前從來沒思考過這個問題,直到我的項目中真正接觸到了這個問題,我才發現,sql的幾個寫法,幾乎只有理論上的意義。
我們來測測。
-
create table t1 (id int, text varchar2(4000));
-
insert into t1
-
select 1, listagg(lpad('a', 60), chr(10)) within group (order by 1)||chr(10)
-
from dual
-
connect by rownum<=64;
-
-
insert into t1
-
select n+1, text
- from t1, (select rownum n from dual connect by rownum<=199);
-
--sql connect by 寫法
-
bill@ORCL> select id, regexp_substr(text, '[^'||chr(10)||']+', 1, level) item_txt
-
2 from t1
-
3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 16.24
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 3874795171
-
-
-------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
-------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
-
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
-------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("ID"=PRIOR "ID")
-
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
-
"DBMS_RANDOM"."VALUE"()>0)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
4 recursive calls
-
0 db block gets
-
313 consistent gets
-
0 physical reads
-
0 redo size
-
228399 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
1 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
-
bill@ORCL> select id, substr(text,
-
2 decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)),
-
3 instr(text, chr(10), 1, level)-decode(level, 1, 1, instr(text, chr(10), 1, level-1)+1)
-
4 from t1
-
5 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 04.89
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 3874795171
-
-
-------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
-------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
-
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
-------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("ID"=PRIOR "ID")
-
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
-
"DBMS_RANDOM"."VALUE"()>0)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
248 consistent gets
-
0 physical reads
-
0 redo size
-
25757310 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
1 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
但是仍然很不滿意,這才幾條數據啊。
試試笛卡爾積的版本
-
bill@ORCL> select substr(text,
-
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
-
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
-
4 from t1,
-
5 (select rownum lv from dual connect by rownum<=64) b;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 03.30
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 894562235
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 2 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
-
| 3 | COUNT | | | | | |
-
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-
| 6 | BUFFER SORT | | 186 | 363K| 70 (0)| 00:00:01 |
-
| 7 | TABLE ACCESS FULL | T1 | 186 | 363K| 68 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
4 - filter(ROWNUM<=64)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
48 recursive calls
-
0 db block gets
-
435 consistent gets
-
2 physical reads
-
0 redo size
-
359032 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
8 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
試試物理表
-
create table t2 (id int, lv int, text varchar2(4000));
-
insert into t2
-
select (n-1)*200+id, n, text
-
from t1, (select rownum n from dual connect by rownum<=64);
-
-
bill@ORCL> select substr(text,
-
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
-
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
-
4 from t2;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 03.53
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 1513984157
-
-
--------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 15663 | 30M| 3567 (1)| 00:00:01 |
-
| 1 | TABLE ACCESS FULL| T2 | 15663 | 30M| 3567 (1)| 00:00:01 |
-
--------------------------------------------------------------------------
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
0 recursive calls
-
0 db block gets
-
13005 consistent gets
-
12997 physical reads
-
0 redo size
-
3526140 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
多做兩個實驗看看。
-
bill@ORCL> select substr(text,
-
2 decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)),
-
3 instr(text, chr(10), 1, lv)-decode(lv, 1, 1, instr(text, chr(10), 1, lv-1)+1)
-
4 from t1,
-
5 (
-
6 select id, lv
-
7 from (select id from t1),
-
8 (select rownum lv from dual connect by rownum<=64)
-
9 where rownum>0
-
10 ) b
-
11 where t1.id=b.id;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 03.56
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 553474792
-
-
----------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 187 | 372K| 71 (0)| 00:00:01 |
-
|* 1 | HASH JOIN | | 187 | 372K| 71 (0)| 00:00:01 |
-
| 2 | VIEW | | 186 | 4836 | 3 (0)| 00:00:01 |
-
| 3 | COUNT | | | | | |
-
|* 4 | FILTER | | | | | |
-
| 5 | MERGE JOIN CARTESIAN | | 186 | 4836 | 3 (0)| 00:00:01 |
-
| 6 | VIEW | | 1 | 13 | 2 (0)| 00:00:01 |
-
| 7 | COUNT | | | | | |
-
|* 8 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 9 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-
| 10 | BUFFER SORT | | 186 | 2418 | 3 (0)| 00:00:01 |
-
| 11 | INDEX FULL SCAN | SYS_C0010642 | 186 | 2418 | 1 (0)| 00:00:01 |
-
| 12 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("T1"."ID"="B"."ID")
-
4 - filter(ROWNUM>0)
-
8 - filter(ROWNUM<=64)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
15 recursive calls
-
0 db block gets
-
515 consistent gets
-
9 physical reads
-
0 redo size
-
25712495 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
2 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
再看看我們明確的告訴oracle select的字段內容,它的表現怎麼樣,
-
bill@ORCL> select substr(t1.id, instr(text, chr(10), 1, 58), instr(text, chr(10), 1, 59)-instr(text, chr(10), 1, 58)-1) item_txt
-
2 from t1,
-
3 (select rownum lv from dual connect by rownum<=64) b;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 01.26
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 894562235
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
-
| 3 | COUNT | | | | | |
-
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-
| 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
4 - filter(ROWNUM<=64)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
7 recursive calls
-
0 db block gets
-
380 consistent gets
-
2 physical reads
-
0 redo size
-
227513 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
2 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
那經過這些測試,似乎可以得出這樣的結論,cpu運算和內存操作現在是需要考慮的重要因素(本來也是,只不過我們似乎總是做不了什麼來優化這些)。
很顯然的結論就是,長的字符串,我們對它進行的substr和instr操作,自然會更慢,變量的賦值,效率也會更低。
思考到這裏,似乎需要轉向plsql了。
因爲我的需求中,打散之後的結果是要和其他表繼續關聯,所以我理所當然想要用管道表函數來試試。
第一個版本的代碼並不難寫,除了plsql中這些複雜的語法,當然複雜是相對sql而言,其他高級編程語言請呵呵。
-
create or replace package refcur_pkg_v1
-
authid current_user
-
as
-
type inrec is record (
-
id number(38),
-
text varchar2(4000));
-
type refcur_t is ref cursor return inrec;
-
type outrec_typ is record (
-
id number(38),
-
item_txt varchar2(4000));
-
type outrecset is table of outrec_typ;
-
function f_cartesian (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any);
-
end;
-
/
-
-
create or replace PACKAGE BODY refcur_pkg_v1 IS
-
FUNCTION f_cartesian (p refcur_t) RETURN outrecset PIPELINED
-
parallel_enable (partition p by any)
-
IS
-
in_rec p%ROWTYPE;
-
out_rec outrec_typ;
-
C_NL_TERM varchar2(2) := chr(10); --unix style
-
v_nl_pos int:=0;
-
v_tmp_pos int:=0;
-
BEGIN
-
LOOP
-
FETCH p INTO in_rec; -- input row
-
EXIT WHEN p%NOTFOUND;
-
-
v_nl_pos :=0;
-
v_tmp_pos :=0;
-
out_rec.id :=in_rec.id;
-
FOR i IN 1..100000 LOOP
-
v_tmp_pos:=instr(in_rec.text, C_NL_TERM, v_nl_pos+1);
-
exit when v_tmp_pos=0;
-
out_rec.item_txt :=substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1);
-
v_nl_pos :=v_tmp_pos;
-
PIPE ROW(out_rec);
-
END LOOP;
-
-
END LOOP;
-
CLOSE p;
-
RETURN;
-
END f_cartesian;
-
END refcur_pkg_v1;
- /
-
bill@ORCL> select id, item_txt
-
2 from table(refcur_pkg_v1.f_cartesian(cursor(
-
3 select id, text
-
4 from t1
-
5 )));
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 00.70
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 4049074522
-
-
--------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
-
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
--------------------------------------------------------------------------------------------------
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
217 recursive calls
-
0 db block gets
-
264 consistent gets
-
0 physical reads
-
0 redo size
-
228399 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
但是我的需求中數據量很龐大,最好還能繼續優化一下。
現在的版本是帶着整個text的內容去循環,那很自然的會想用二分法去試一試。
-
create or replace package refcur_pkg
-
authid current_user
-
as
-
type inrec is record (
-
id number(38),
-
lines number(38),
-
text varchar2(4000));
-
type refcur_t is ref cursor return inrec;
-
type outrec_typ is record (
-
id number(38),
-
item_txt varchar2(4000));
-
type outrecset is table of outrec_typ;
-
function f_cartesian (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any);
-
function f_cartesian2 (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any);
-
function f_cartesian3 (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any);
-
end;
-
/
-
-
create or replace package body refcur_pkg IS
-
function f_cartesian (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any)
-
is
-
in_rec p%ROWTYPE;
-
out_rec outrec_typ;
-
C_NL_TERM varchar2(2) := chr(10); --unix style
-
C_NL_LENG int := 1; --unix new line length
-
C_SIZE int:=64; --split clob to varchar by every C_SIZE-th newline character
-
C_MAX_LEN INT:=1024; --max length for item_txt; if change, also change item_txt varchar2(1024)
-
v_div_pos int; --end postion of each sub clob
-
v1_div_pos int; --end postion of each sub clob
-
v2_div_pos int; --end postion of each sub clob
-
v3_div_pos int; --end postion of each sub clob
-
v4_div_pos int; --end postion of each sub clob
-
v5_div_pos int; --end postion of each sub clob
-
v1_substr varchar2(4000);
-
v2_substr varchar2(4000);
-
v3_substr varchar2(4000);
-
v4_substr varchar2(4000);
-
v5_substr varchar2(4000);
-
begin
-
loop
-
fetch p into in_rec; -- input row
-
exit when p%NOTFOUND;
-
out_rec.id :=in_rec.id;
-
-
--if lines=C_SIZE, then dichotomy
-
if in_rec.lines=C_SIZE then
-
v_div_pos :=instr(in_rec.text, C_NL_TERM, 1, C_SIZE/2);
-
for a in 1..2 loop
-
v1_substr :=substr(in_rec.text, 1+v_div_pos*(a-1), v_div_pos*(2-a)+1e6*(a-1));
-
v1_div_pos :=instr(v1_substr, C_NL_TERM, 1, C_SIZE/4);
-
for b in 1..2 loop
-
v2_substr :=substr(v1_substr, 1+v1_div_pos*(b-1), v1_div_pos*(2-b)+1e6*(b-1));
-
v2_div_pos :=instr(v2_substr, C_NL_TERM, 1, C_SIZE/8);
-
for c in 1..2 loop
-
v3_substr :=substr(v2_substr, 1+v2_div_pos*(c-1), v2_div_pos*(2-c)+1e6*(c-1));
-
v3_div_pos :=instr(v3_substr, C_NL_TERM, 1, C_SIZE/16);
-
for d in 1..2 loop
-
v4_substr :=substr(v3_substr, 1+v3_div_pos*(d-1), v3_div_pos*(2-d)+1e6*(d-1));
-
v4_div_pos :=instr(v4_substr, C_NL_TERM, 1, C_SIZE/32);
-
for e in 1..2 loop
-
v5_substr :=substr(v4_substr, 1+v4_div_pos*(e-1), v4_div_pos*(2-e)+1e6*(e-1));
-
v5_div_pos :=instr(v5_substr, C_NL_TERM, 1, 1);
-
for f in 1..2 loop
-
out_rec.item_txt :=substr(v5_substr, 1+v5_div_pos*(f-1), (v5_div_pos-1)*(2-f)+(v4_div_pos-v5_div_pos-1)*(f-1));
-
exit when out_rec.item_txt is null;
-
pipe row(out_rec);
-
end loop;
-
end loop;
-
end loop;
-
end loop;
-
end loop;
-
end loop;
-
--if lines=C_SIZE, then ordinary loop method,
-
else
-
v_div_pos:=0;
-
v1_div_pos:=0;
-
for i in 1..1000000000 loop
-
v1_div_pos:=instr(in_rec.text, C_NL_TERM, v_div_pos+1);
-
exit when v1_div_pos=0;
-
out_rec.item_txt :=substr(in_rec.text, v_div_pos+1, v1_div_pos-v_div_pos-1);
-
v_div_pos :=v1_div_pos;
-
pipe row(out_rec);
-
end loop;
-
end if;
-
end loop;
-
close p;
-
return;
-
end f_cartesian;
-
-
function f_cartesian2 (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any)
-
is
-
in_rec p%ROWTYPE;
-
out_rec outrec_typ;
-
begin
-
loop
-
fetch p into in_rec; -- input row
-
exit when p%NOTFOUND;
-
out_rec.id :=in_rec.id;
-
out_rec.item_txt :=in_rec.text;
-
for i in 1..in_rec.lines loop
-
pipe row(out_rec);
-
end loop;
-
end loop;
-
close p;
-
return;
-
end f_cartesian2;
-
-
function f_cartesian3 (p refcur_t) return outrecset pipelined
-
parallel_enable (partition p by any)
-
is
-
in_rec p%ROWTYPE;
-
out_rec outrec_typ;
-
begin
-
loop
-
fetch p into in_rec; -- input row
-
exit when p%NOTFOUND;
-
out_rec.id :=in_rec.id;
-
out_rec.item_txt :=substr(in_rec.text, 1, 64);
-
for i in 1..in_rec.lines loop
-
pipe row(out_rec);
-
end loop;
-
end loop;
-
close p;
-
return;
-
end f_cartesian3;
-
-
end refcur_pkg;
- /
-
bill@ORCL> select id, item_txt
-
2 from table(refcur_pkg.f_cartesian(cursor(
-
3 select id, length(text)-length(replace(text, chr(10))), text
-
4 from t1
-
5 )));
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 00.40
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 4049074522
-
-
--------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
--------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN | 8168 | | 29 (0)| 00:00:01 |
-
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
--------------------------------------------------------------------------------------------------
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
258 recursive calls
-
0 db block gets
-
469 consistent gets
-
1 physical reads
-
0 redo size
-
228399 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
因爲在我的處理中,是要從clob進行打散,所以我先把之前的處理結果分成64行爲一個大的varchar,到這裏再用硬性的6層二分法,
如果你的需求中行數不定,可以略微更改一下if條件,以及二分法的層數,這樣可以不必在二分法的層數上太過傷神。
總之,我們看到了性能的提升,是顯著的,而且,在我的測試當中,我發現先把instr的變量存起來,
也就是類似substr(in_rec.text, v_nl_pos+1, v_tmp_pos-v_nl_pos-1)這種寫法,要更高效,上面的代碼其實還可以再改。
純粹爲了測試,我還做了f_cartesian2和f_cartesian3,
前者是簡單的實現數據的複製,後者是每行取指定的substr,在這兩種情況下,其實plsql就沒有什麼優勢了。
-
--sql connect by
-
bill@ORCL> select id, text
-
2 from t1
-
3 connect by prior id=id and level<=length(text)-length(replace(text, chr(10))) and prior dbms_random.value>0;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 06.81
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 3874795171
-
-
-------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
-------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 186 | 366K| 68 (0)| 00:00:01 |
-
|* 1 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 2 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
-------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
1 - access("ID"=PRIOR "ID")
-
filter(LEVEL<=LENGTH("TEXT")-LENGTH(REPLACE("TEXT",' ')) AND PRIOR
-
"DBMS_RANDOM"."VALUE"()>0)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
4 recursive calls
-
0 db block gets
-
313 consistent gets
-
0 physical reads
-
0 redo size
-
236099 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
1 sorts (memory)
-
0 sorts (disk)
-
12800 rows processed
-
-
--sql笛卡爾
-
bill@ORCL> select id, text
-
2 from t1,
-
3 (select rownum n from dual connect by rownum<=64) b;
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 05.00
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 894562235
-
-
----------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
----------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 1 | MERGE JOIN CARTESIAN | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 2 | VIEW | | 1 | | 2 (0)| 00:00:01 |
-
| 3 | COUNT | | | | | |
-
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
-
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
-
| 6 | BUFFER SORT | | 186 | 366K| 70 (0)| 00:00:01 |
-
| 7 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
----------------------------------------------------------------------------------------
-
-
Predicate Information (identified by operation id):
-
---------------------------------------------------
-
-
4 - filter(ROWNUM<=64)
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
7 recursive calls
-
0 db block gets
-
378 consistent gets
-
0 physical reads
-
0 redo size
-
280133 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
2 sorts (memory)
-
0 sorts (disk)
-
12800 rows processed
-
-
--管道表函數
-
bill@ORCL> select id, item_txt
-
2 from table(refcur_pkg.f_cartesian2(cursor(
-
3 select id, length(text)-length(replace(text, chr(10))), text
-
4 from t1
-
5 )));
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 05.16
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 2991304848
-
-
---------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
---------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN2 | 8168 | | 29 (0)| 00:00:01 |
-
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
---------------------------------------------------------------------------------------------------
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
266 recursive calls
-
0 db block gets
-
529 consistent gets
-
0 physical reads
-
0 redo size
-
236103 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
-
12800 rows processed
-
-
-
bill@ORCL> select id, item_txt
-
2 from table(refcur_pkg.f_cartesian3(cursor(
-
3 select id, length(text)-length(replace(text, chr(10))), text
-
4 from t1
-
5 )));
-
-
已選擇 12800 行。
-
-
已用時間: 00: 00: 00.29
-
-
執行計劃
-
----------------------------------------------------------
-
Plan hash value: 2831580648
-
-
---------------------------------------------------------------------------------------------------
-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-
---------------------------------------------------------------------------------------------------
-
| 0 | SELECT STATEMENT | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 1 | VIEW | | 8168 | 271K| 29 (0)| 00:00:01 |
-
| 2 | COLLECTION ITERATOR PICKLER FETCH| F_CARTESIAN3 | 8168 | | 29 (0)| 00:00:01 |
-
| 3 | TABLE ACCESS FULL | T1 | 186 | 366K| 68 (0)| 00:00:01 |
-
---------------------------------------------------------------------------------------------------
-
-
Note
-
-----
-
- dynamic statistics used: dynamic sampling (level=2)
-
-
-
統計信息
-
----------------------------------------------------------
-
278 recursive calls
-
0 db block gets
-
555 consistent gets
-
0 physical reads
-
124 redo size
-
228407 bytes sent via SQL*Net to client
-
9927 bytes received via SQL*Net from client
-
855 SQL*Net roundtrips to/from client
-
0 sorts (memory)
-
0 sorts (disk)
- 12800 rows processed
不過,從此以後,oracle裏面,再有這種需求,大家不妨參考這裏plsql的思路來做。