本文以Oracle數據庫爲例，說明在計算機的世界裏，如果能用好並行這件利器，可以大幅提升性能；如果沒用好，輕則達不到預期性能，重則會連帶影響到整個系統的可用性，也正如本文標題所言：並行，想說愛你不容易。
下面，我們具體來看一些真實的測試場景，演示環境：Oracle RAC 11.2.0.4(3 nodes)。

1.並行insert無效果

測試用例：

create table Z_OBJ tablespace TBS_1 as select * from dba_objects ;
insert /*+ append parallel(t0,16) */ into Z_OBJ t0 select /*+ parallel(t1,16) */ * from Z_OBJ t1;
commit;
--多次執行並查詢大小
select owner,segment_name,bytes/1024/1024 from dba_segments where segment_name='Z_OBJ';

根據測試用例執行，發現實際並沒有合理使用到並行度，效率很差（監控到I/O寫入每秒只有百兆級別，正常應該是每秒千兆級別）。
查看執行計劃：

SQL> explain plan for insert /*+ append parallel(t0,16) */ into Z_OBJ t0 select /*+ parallel(t1,16) */ * from Z_OBJ t1;

Explained.

SQL> set lines 1000 pages 200
SQL> select * from table(dbms_xplan.display());  

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1886916412

---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT      |          |    91M|    17G| 23842   (1)| 00:00:01 |        |      |            |
|   1 |  LOAD AS SELECT       | Z_OBJ    |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |    91M|    17G| 23842   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR |          |    91M|    17G| 23842   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| Z_OBJ    |    91M|    17G| 23842   (1)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

16 rows selected.

可以看到，只有查詢部分用到了並行，insert部分並沒有使用到並行，儘管我們指定了並行度的hint。
此時需要顯示啓用DML的並行：

alter session enable parallel dml;

再次查看執行計劃，發現insert部分已經可以使用到並行：

SQL> explain plan for insert /*+ append parallel(t0,16) */ into Z_OBJ t0 select /*+ parallel(t1,16) */ * from Z_OBJ t1;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2135351304

---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT      |          |    91M|    17G| 23842   (1)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |    91M|    17G| 23842   (1)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT     | Z_OBJ    |       |       |            |          |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          |    91M|    17G| 23842   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| Z_OBJ    |    91M|    17G| 23842   (1)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

16 rows selected.

小結1：不僅僅是insert操作，其他DML操作的並行，都需要顯示啓用DML的並行：alter session enable parallel dml;
需要注意的是，雖然這裏的並行DML測試性能提升的效果顯著，但實際生產是需要慎重考慮是否使用並行DML的，因爲要考慮TM鎖的影響。之前就曾遇到過某客戶在開啓並行DML的同時，應用程序又大量並行調用，導致嚴重的TM鎖等待，最終還是取消並行DML消除TM鎖等待，反而提升了性能。

2.並行只在本地節點

默認情況下，並行操作會分發到RAC的各個節點，而很多生產數據庫下，我們並不希望並行跨節點執行。此時就需要設置參數：

alter system set parallel_force_local=true sid='*';

這樣執行插入操作，在各個節點進行dstat監控，就會發現只有本地節點有每秒幾百M的寫入操作，說明parallel_force_local=true參數動態生效了：

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  1   0  98   0   0   0| 163M  326M|  74k   61k|   0     0 |  17k   51k
  2   0  98   0   0   0| 164M  325M| 479k   29k|   0     0 |  18k   51k
  2   0  98   0   0   0| 165M  330M| 833k 1347k|   0     0 |  21k   54k
  1   0  98   0   0   0| 167M  336M|  47k   58k|   0     0 |  18k   52k
  1   0  98   0   0   0| 173M  340M| 507k   31k|   0     0 |  18k   53k
  1   0  98   0   0   0| 176M  354M|  77k  546k|   0     0 |  18k   54k
  1   0  98   0   0   0| 168M  341M|  43k   44k|   0     0 |  18k   53k
  2   0  98   0   0   0| 177M  353M|  32k   42k|   0     0 |  18k   54k
  2   0  98   0   0   0| 183M  362M|  65k   67k|   0     0 |  17k   54k
  1   0  98   0   0   0| 163M  329M|  44k   44k|   0     0 |  16k   49k
  1   0  98   0   0   0| 165M  328M|  39k   33k|   0     0 |  18k   51k
  1   0  98   0   0   0| 161M  323M|  43k   56k|   0     0 |  17k   50k
  2   0  98   0   0   0| 182M  360M|  44k   49k|   0     0 |  18k   55k
  1   0  98   0   0   0| 166M  331M|  34k   52k|   0     0 |  18k   51k
  2   0  98   0   0   0| 162M  327M|  25k   25k|   0     0 |  18k   51k

此時再結合1中的經驗，啓用dml的並行，可以發現效率大幅提升，本地節點有每秒幾千M的寫入操作：

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  8   1  90   1   0   0|2927M 5882M| 771k  140k|   0     0 | 107k  157k
  9   1  90   1   0   0|3134M 6266M| 759k 1484k|   0     0 | 108k  161k
  8   1  90   1   0   0|3021M 6042M| 154k  178k|   0     0 | 104k  155k
  9   1  90   0   0   0|3000M 6004M| 259k  266k|   0     0 | 106k  156k
  9   1  90   0   0   0|2875M 5754M| 129k  142k|   0     0 | 102k  150k
  9   1  90   0   0   0|3082M 6160M| 127k  135k|   0     0 | 108k  158k
  9   1  90   0   0   0|3044M 6095M| 655k  642k|   0     0 | 107k  158k
  9   1  89   0   0   0|2961M 5923M| 125k  134k|   0     0 | 105k  153k
  9   1  90   0   0   0|2875M 5747M| 137k  168k|   0     0 | 102k  150k
  9   1  90   0   0   0|3156M 6312M| 127k  135k|   0     0 | 109k  163k
  9   1  90   1   0   0|3144M 6291M| 130k  138k|   0     0 | 109k  162k
  9   1  90   1   0   0|3058M 6117M| 125k  143k|   0     0 | 106k  157k
  9   1  90   0   0   0|3138M 6279M| 132k  139k|   0     0 | 108k  161k
  9   1  90   0   0   0|3039M 6074M| 141k  143k|   0     0 | 106k  156k
  4   1  95   0   0   0|1237M 2615M| 986k   61k|   0     0 |  68k   90k

小結2：可設置參數parallel_force_local=true強制讓並行操作在本地節點執行，這是個動態參數：

3.增大並行度的效果

創建大表Z_OBJ_3，使用32個並行度插入數據：

create table Z_OBJ_3 tablespace TBS_3 as select * from dba_objects ;

insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;
commit;

實際花費25s的時間插入完成，並行度提升性能也進一步提升：

SQL> insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;

867092478 rows created.

Elapsed: 00:00:25.52

此時dstat監控，每秒寫操作達到8000M+：

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0 100   0   0   0|2489k 1036k|   0     0 |   0     0 |  10k 9766 
 13   1  83   2   0   0|3755M 7542M| 699k 1055k|   0     0 | 143k  210k
 12   2  84   2   0   0|3634M 7407M| 447k  453k|   0     0 | 147k  209k
 13   1  83   2   0   0|4202M 8402M| 535k  553k|   0     0 | 141k  215k
 14   1  82   2   0   0|4168M 8339M| 539k  556k|   0     0 | 144k  214k
 13   1  82   2   0   1|4109M 8224M| 546k  552k|   0     0 | 142k  210k
 13   1  83   3   0   0|4209M 8419M| 311k  327k|   0     0 | 138k  213k
 13   1  83   3   0   0|4237M 8483M| 114k  114k|   0     0 | 136k  210k
  9   1  88   1   0   1|2709M 5703M|  64k   65k|   0     0 | 156k  203k
 14   1  82   2   0   0|4189M 8383M|  91k   87k|   0     0 | 136k  205k
 13   1  82   3   0   0|4237M 8478M|  95k  101k|   0     0 | 136k  208k
 14   1  82   2   0   0|4242M 8485M|  95k  109k|   0     0 | 139k  208k
 14   1  82   3   0   0|4202M 8412M| 835k  103k|   0     0 | 137k  208k
 14   1  82   2   0   0|4288M 8563M|1143k 1930k|   0     0 | 139k  211k
 14   1  82   2   0   0|4229M 8477M| 101k   97k|   0     0 | 138k  209k

再創建大表Z_OBJ_4，使用64個並行度插入數據：

create table Z_OBJ_4 tablespace TBS_4 as select * from dba_objects ;

insert /*+ append parallel(t0,64) */ into Z_OBJ_4 t0 select /*+ parallel(t1,64) */ * from Z_OBJ t1;
commit;

實際花費28s的時間插入完成，發現即使在CPU足夠的前提下，並行度提升沒有性能提升，說明I/O已達到瓶頸：

SQL> insert /*+ append parallel(t0,64) */ into Z_OBJ_4 t0 select /*+ parallel(t1,64) */ * from Z_OBJ t1;

867092478 rows created.

Elapsed: 00:00:28.61

此時dstat監控，每秒寫操作接近8000M：

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
 14   2  81   4   0   1|3844M 7711M|3571k 2567k|   0     0 | 130k  197k
 12   1  83   3   0   0|3810M 7602M| 535k 1885k|   0     0 | 115k  175k
 13   1  82   3   0   0|3799M 7607M| 603k  654k|   0     0 | 116k  174k
 14   1  82   3   0   0|3810M 7638M| 550k  602k|   0     0 | 119k  176k
 13   1  83   3   0   0|3766M 7531M| 630k  651k|   0     0 | 114k  171k
 13   1  81   4   0   0|3804M 7608M| 620k  669k|   0     0 | 117k  175k
 13   1  82   3   0   0|3792M 7585M| 581k  616k|   0     0 | 117k  176k
 13   1  82   3   0   0|3767M 7522M| 561k  612k|   0     0 | 116k  173k
 12   1  82   3   0   0|3659M 7343M| 553k  601k|   0     0 | 115k  170k
 13   1  82   3   0   0|3659M 7340M| 609k  668k|   0     0 | 121k  179k
 13   1  82   3   0   0|3746M 7502M| 609k  644k|   0     0 | 117k  174k
 13   1  82   3   0   0|3822M 7648M| 675k  773k|   0     0 | 118k  178k
 13   1  83   3   0   0|3769M 7541M|1191k  632k|   0     0 | 115k  173k
 13   1  83   3   0   0|3864M 7725M|1749k 2533k|   0     0 | 117k  177k
 13   1  82   3   0   0|3741M 7481M| 613k  655k|   0     0 | 116k  172k

小結3：一般增大並行度可以提升操作返回速度，但同時也受限於整體的系統I/O能力。

4.所有節點並行測試

同時測試RAC的3個節點：

--節點1
set time on
set timing on
drop table Z_OBJ_2 purge;
create table Z_OBJ_2 tablespace TBS_2 as select * from dba_objects where 1=2;
alter session enable parallel dml;
--INSERT Z_OBJ_2
insert /*+ append parallel(t0,32) */ into Z_OBJ_2 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;
commit;


--節點2
set time on
set timing on
drop table Z_OBJ_3 purge;
create table Z_OBJ_3 tablespace TBS_3 as select * from dba_objects where 1=2;
alter session enable parallel dml;
--INSERT Z_OBJ_3
insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;
commit;

--節點3
set time on
set timing on
drop table Z_OBJ_4 purge;
create table Z_OBJ_4 tablespace TBS_4 as select * from dba_objects where 1=2;
alter session enable parallel dml;
--INSERT Z_OBJ_4
insert /*+ append parallel(t0,32) */ into Z_OBJ_4 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;
commit;

各節點同時觀察插入耗時（單個執行時間變長，整體的I/O瓶頸導致）：

15:26:06 SQL> insert /*+ append parallel(t0,32) */ into Z_OBJ_2 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;

867092478 rows created.

Elapsed: 00:00:48.53

15:25:23 SQL>  insert /*+ append parallel(t0,32) */ into Z_OBJ_3 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;

867092478 rows created.

Elapsed: 00:00:45.84

15:25:21 SQL>  insert /*+ append parallel(t0,32) */ into Z_OBJ_4 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;

867092478 rows created.

Elapsed: 00:00:47.63

各節點dstat同時觀察：

--node1：
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  7   1  82   9   0   0|2110M 4223M| 169k  230k|   0     0 |  78k  122k
  7   1  82   9   0   0|2107M 4209M| 176k  178k|   0     0 |  79k  123k
  9   1  81   9   0   0|2614M 5237M| 190k  195k|   0     0 |  96k  148k
  8   1  81  10   0   0|2171M 4339M| 195k  232k|   0     0 |  84k  127k
  7   1  83   9   0   0|1975M 3947M| 220k  184k|   0     0 |  76k  117k
  7   1  82   9   0   0|2051M 4099M| 166k  169k|   0     0 |  78k  121k
  7   1  82  10   0   0|2059M 4121M|1193k  170k|   0     0 |  79k  121k
  7   1  83   9   0   0|2001M 4011M| 384k 1463k|   0     0 |  76k  118k
  3   0  93   4   0   0| 802M 1570M| 148k  144k|   0     0 |  36k   53k
  2   0  96   2   0   0| 355M  886M| 113k  137k|   0     0 |  47k   61k
  8   1  82   9   0   0|2122M 4255M| 189k  202k|   0     0 |  79k  123k
  7   1  83   9   0   0|2040M 4069M| 162k  164k|   0     0 |  76k  119k
  8   1  82   9   0   0|2208M 4436M| 839k  843k|   0     0 |  83k  130k
  9   1  83   7   0   0|2506M 5037M| 305k  307k|   0     0 |  94k  145k
  4   0  93   2   0   0|1098M 2273M| 218k  233k|   0     0 |  49k   72k
  
--node2：
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  6   1  82  11   0   0|2152M 4312M| 221k  224k|   0     0 |  79k  130k
  7   1  82  10   0   0|2226M 4447M| 216k  218k|   0     0 |  81k  133k
 10   1  81   8   0   0|2775M 5559M| 244k  214k|   0     0 | 100k  159k
  7   1  83   9   0   0|2110M 4205M| 220k  221k|   0     0 |  77k  126k
  7   1  83  10   0   0|2104M 4219M| 231k  266k|   0     0 |  76k  126k
  7   1  83  10   0   0|2158M 4311M| 207k  207k|   0     0 |  78k  129k
  7   1  83  10   0   0|2103M 4214M| 877k  849k|   0     0 |  76k  126k
  7   1  82  10   0   0|2109M 4214M| 207k  209k|   0     0 |  76k  124k
 10   1  81   8   0   0|2934M 5866M| 212k  216k|   0     0 | 102k  165k
  7   1  82  10   0   0|2281M 4551M| 207k  227k|   0     0 |  82k  133k
  7   1  83  10   0   0|2136M 4281M| 206k  205k|   0     0 |  79k  128k
  6   1  84  10   0   0|1951M 3940M| 313k  341k|   0     0 |  73k  120k
  4   0  92   4   0   0|1044M 2250M| 672k  642k|   0     0 |  56k   88k
  0   0  99   0   0   0|  50M  116M| 258k  276k|   0     0 |  11k   14k
  0   0 100   0   0   0| 323k   58k| 208k  202k|   0     0 |8385    10k

--node3：
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  6   1  83  10   0   0|2144M 4274M| 149k  156k|   0     0 |  77k  129k
  6   1  82  11   0   0|2223M 4452M| 165k  189k|   0     0 |  80k  133k
  6   1  82  11   0   0|2203M 4404M| 189k  198k|   0     0 |  79k  131k
  7   0  83  10   0   0|2119M 4233M| 140k  211k|   0     0 |  75k  125k
  7   1  83  10   0   0|2156M 4311M| 870k  731k|   0     0 |  78k  128k
  7   1  82  10   0   0|2157M 4318M| 143k  149k|   0     0 |  79k  129k
  7   1  83   9   0   0|2172M 4344M| 165k  170k|   0     0 |  79k  131k
  7   1  83  10   0   0|2139M 4283M| 140k  141k|   0     0 |  78k  125k
  7   1  83  10   0   0|2145M 4303M| 143k  151k|   0     0 |  78k  129k
  7   1  83  10   0   0|2121M 4226M| 146k  450k|   0     0 |  76k  126k
  7   1  82  10   0   0|2442M 4884M| 460k  155k|   0     0 |  87k  144k
  6   0  83  10   0   0|2083M 4177M| 217k  156k|   0     0 |  76k  126k
  4   0  88   7   0   0|1445M 2863M| 130k  126k|   0     0 |  54k   89k
  2   0  94   3   0   0| 577M 1341M| 121k  124k|   0     0 |  53k   73k
  7   1  82  10   0   0|2219M 4437M| 157k  193k|   0     0 |  81k  133k

測試到這裏，還有一個疑惑，爲什麼不用create？我們來按測試用例試下create操作，很不如人意，只有300多M的寫入速度，將近10分鐘才創建完成。而上面的並行insert則有8000多M的寫入速度，20s+就可以插入完成：

drop table Z_OBJ_2 purge;
create table Z_OBJ_2 tablespace TBS_2 as select /*+ parallel(t1,32) */ * from Z_OBJ t1;
Elapsed: 00:09:19.52

15:49:58 SQL> insert /*+ append parallel(t0,32) */ into Z_OBJ_2 t0 select /*+ parallel(t1,32) */ * from Z_OBJ t1;
867092478 rows created.
Elapsed: 00:00:25.24

很顯然,create操作相當於沒有用到並行，如何讓create操作也用到並行度呢？這就需要將SQL語句改寫如下：

--使用到並行，26s就完成了百G大小表的創建：
drop table Z_OBJ_2 purge;
create table Z_OBJ_2 tablespace TBS_2 parallel(degree 32) as select /*+ parallel(t1,32) */ * from Z_OBJ t1;
Elapsed: 00:00:26.76

--使用到並行+nologging，差距不大，只需25s就完成了百G大小表的創建：
drop table Z_OBJ_2 purge;
create table Z_OBJ_2 tablespace TBS_2 parallel(degree 32) nologging as select /*+ parallel(t1,32) */ * from Z_OBJ t1;
Elapsed: 00:00:25.77

小結4：我們在使用並行的時候，尤其要注意是否各部分都有效的使用到了並行。各節點同時並行操作的整體效率，同樣受限於整體的系統I/O能力。

5.RMAN多通道的並行

現象：RMAN分配多個通道，但實際無法使用到並行。
構建測試用例：

create tablespace dbs_d_test;
alter tablespace dbs_d_test add datafile; --這裏是11
alter tablespace dbs_d_test add datafile; --這裏是12
alter tablespace dbs_d_test add datafile; --這裏是13

alter database datafile 11,12,13 resize 1G;

5.1 RMAN多通道但未用到並行

使用RMAN備份

run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;

backup as copy datafile 11 format '/tmp/incr/copy11.bak';
backup as copy datafile 12 format '/tmp/incr/copy12.bak';
backup as copy datafile 13 format '/tmp/incr/copy13.bak';

release channel c1;
release channel c2;
release channel c3;
}

使用下面SQL查詢長操作：

select inst_id, sid, username, opname, target, sofar, totalwork, sofar * 100 / totalwork from gv$session_longops where sofar < totalwork;

發現上面這種備份寫法，雖然分配了多個通道，但實際觀察並沒有使用到並行。3個文件的備份是串行操作的。這點從上面的長操作中可以看到，同時從RMAN輸出日誌中同樣也可以看到：

RMAN> run {
2> allocate channel c1 device type disk;
3> allocate channel c2 device type disk;
4> allocate channel c3 device type disk;
5> 
6> backup as copy datafile 11 format '/tmp/incr/copy11.bak';
7> backup as copy datafile 12 format '/tmp/incr/copy12.bak';
8> backup as copy datafile 13 format '/tmp/incr/copy13.bak';
9> 
10> release channel c1;
11> release channel c2;
12> release channel c3;
13> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: sid=128 instance=jy1 devtype=DISK

allocated channel: c2
channel c2: sid=117 instance=jy1 devtype=DISK

allocated channel: c3
channel c3: sid=129 instance=jy1 devtype=DISK

Starting backup at 29-AUG-18
channel c1: starting datafile copy
input datafile fno=00011 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.615.985387387
output filename=/tmp/incr/copy11.bak tag=TAG20180829T002101 recid=13 stamp=985393279
channel c1: datafile copy complete, elapsed time: 00:00:25
Finished backup at 29-AUG-18

Starting backup at 29-AUG-18
channel c1: starting datafile copy
input datafile fno=00012 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.613.985387391
output filename=/tmp/incr/copy12.bak tag=TAG20180829T002127 recid=14 stamp=985393305
channel c1: datafile copy complete, elapsed time: 00:00:25
Finished backup at 29-AUG-18

Starting backup at 29-AUG-18
channel c1: starting datafile copy
input datafile fno=00013 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.611.985387395
output filename=/tmp/incr/copy13.bak tag=TAG20180829T002153 recid=15 stamp=985393330
channel c1: datafile copy complete, elapsed time: 00:00:25
Finished backup at 29-AUG-18

released channel: c1

released channel: c2

released channel: c3

實際是串行操作，都是用的通道c1，這3個數據文件的copy備份消耗3個25s=75s。

5.2 備份語句改寫使用到並行

改進寫法，用到了並行：

run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;

backup as copy datafile 11,12,13 format '/tmp/incr/copy_%u.bak';

release channel c1;
release channel c2;
release channel c3;
}

從日誌看到：

RMAN> run {
2> allocate channel c1 device type disk;
3> allocate channel c2 device type disk;
4> allocate channel c3 device type disk;
5> 
6> backup as copy datafile 11,12,13 format '/tmp/incr/copy_%u.bak';
7> 
8> release channel c1;
9> release channel c2;
10> release channel c3;
11> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: sid=129 instance=jy1 devtype=DISK

allocated channel: c2
channel c2: sid=127 instance=jy1 devtype=DISK

allocated channel: c3
channel c3: sid=119 instance=jy1 devtype=DISK

Starting backup at 29-AUG-18
channel c1: starting datafile copy
input datafile fno=00011 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.615.985387387
channel c2: starting datafile copy
input datafile fno=00012 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.613.985387391
channel c3: starting datafile copy
input datafile fno=00013 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.611.985387395
output filename=/tmp/incr/copy_14tbnq76.bak tag=TAG20180829T002302 recid=16 stamp=985393432
channel c1: datafile copy complete, elapsed time: 00:00:55
output filename=/tmp/incr/copy_15tbnq76.bak tag=TAG20180829T002302 recid=17 stamp=985393432
channel c2: datafile copy complete, elapsed time: 00:00:55
output filename=/tmp/incr/copy_16tbnq76.bak tag=TAG20180829T002302 recid=18 stamp=985393435
channel c3: datafile copy complete, elapsed time: 00:00:55
Finished backup at 29-AUG-18

released channel: c1

released channel: c2

released channel: c3

實際是並行操作，分別用的通道c1、c2、c3，這3個數據文件的copy備份消耗1個55s=55s。
那爲什麼並行沒有成倍增加效率？跟上一篇提到的一樣，系統的整體I/O能力達到瓶頸了。所以一味的增加並行度並不總是有意義的。

5.3 備份方式改變提高效率

如果數據文件很大，但實際使用的並不多，則可以考慮使用備份集的方式，減少備份對空間的佔用，一般同時也會加快備份的速度：

run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;

backup as compressed backupset datafile 11,12,13 format '/tmp/incr/datafile_%u.bak';

release channel c1;
release channel c2;
release channel c3;
}

從日誌可以看到：

RMAN> run {
2> allocate channel c1 device type disk;
3> allocate channel c2 device type disk;
4> allocate channel c3 device type disk;
5> 
6> backup as compressed backupset datafile 11,12,13 format '/tmp/incr/datafile_%u.bak';
7> 
8> release channel c1;
9> release channel c2;
10> release channel c3;
11> }

using target database control file instead of recovery catalog
allocated channel: c1
channel c1: sid=128 instance=jy1 devtype=DISK

allocated channel: c2
channel c2: sid=134 instance=jy1 devtype=DISK

allocated channel: c3
channel c3: sid=116 instance=jy1 devtype=DISK

Starting backup at 29-AUG-18
channel c1: starting compressed full datafile backupset
channel c1: specifying datafile(s) in backupset
input datafile fno=00011 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.615.985387387
channel c1: starting piece 1 at 29-AUG-18
channel c2: starting compressed full datafile backupset
channel c2: specifying datafile(s) in backupset
input datafile fno=00012 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.613.985387391
channel c2: starting piece 1 at 29-AUG-18
channel c3: starting compressed full datafile backupset
channel c3: specifying datafile(s) in backupset
input datafile fno=00013 name=+ZHAOJINGYU/jy/datafile/dbs_d_test.611.985387395
channel c3: starting piece 1 at 29-AUG-18
channel c1: finished piece 1 at 29-AUG-18
piece handle=/tmp/incr/datafile_17tbnqi9.bak tag=TAG20180829T002857 comment=NONE
channel c1: backup set complete, elapsed time: 00:00:02
channel c3: finished piece 1 at 29-AUG-18
piece handle=/tmp/incr/datafile_19tbnqia.bak tag=TAG20180829T002857 comment=NONE
channel c3: backup set complete, elapsed time: 00:00:01
channel c2: finished piece 1 at 29-AUG-18
piece handle=/tmp/incr/datafile_18tbnqi9.bak tag=TAG20180829T002857 comment=NONE
channel c2: backup set complete, elapsed time: 00:00:05
Finished backup at 29-AUG-18

released channel: c1

released channel: c2

released channel: c3

由於我這裏這幾個文件根本沒有業務數據，所以效率提升尤爲明顯，只需要5s鍾就完成了備份。
小結5：除了合理的並行使用，更要考慮當前是否有方案可以少做事，避免並行去做無用功，白白浪費計算資源。
關於並行，還有些有意思的場景，比如就曾遇到過有開發人員寫錯SQL並行度的hint導致oracle採用了自動DOP，即最大並行度執行，導致系統資源基本全被佔用，進而其他操作無法高效運行導致性能故障。
看到這裏，發現並行的使用的確是存在很多坑，但我們也不能因噎廢食，只要認真掌握並行相關的知識點，就完全可以用好這把利器，使其在合適的場景下大放異彩，驕傲的說：“並行，想說愛你也容易。”

文中案例部分引用我之前blog中的文章：

並行，想說愛你不容易

1.並行insert無效果

2.並行只在本地節點

3.增大並行度的效果

4.所有節點並行測試

5.RMAN多通道的並行

5.1 RMAN多通道但未用到並行

5.2 備份語句改寫使用到並行

5.3 備份方式改變提高效率

Nginx R31 doc 官方文檔-01-nginx 如何安裝

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

贊！蘇州大學95後碩士一作發《Nature》！

js bridge 實現原理

寒假2019培訓：Tarjan學習+割點+割邊+強連通分量

RTMP推流協議視頻直播點播平臺EasyDSS請求時間接口返回的數據打印在前端頁面全屏飄紅問題解決

vtkImageData: 圖像彩色映射

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結