理解LGWR,Log File Sync Waits以及Commit的性能問題

理解LGWR,Log File Sync Waits以及Commit的性能問題


一.概要:

1.  Commit和log filesync的工作機制

2.  爲什麼log file wait太久

3.   如何去度量問題出在那裏呢?

二.log file sync等待的原因

1.  默認情況下我們commit一個事務是要等待logfile sync,這其中包括:

(1)User  commit(用戶提交的統計信息可以通過v$sesstat來查看)

(2)DDL-這一部分主要是由於遞歸的事務提交所產生

(3)遞歸的數據字典DML操作

2. Rollbacks導致log file sync等待

   (1)Userrollbacks-用戶或者由應用發出的rollback操作所致

   (2)Transactionrollbacks:1,由於一些失敗的操作導致oracle內部的rollback 2.空間分配,或者ASSM相關的問題,以及用戶取消的長查詢,被kill掉的session等等。

下圖爲Commit和log file sync相關的流程圖:



Log file sync performance>disk IO speed

****大多數log file sync的等待時間其實都是花費在logfile parallel write,類似與DBWR會等待db file parallel write

****其它的log file sync等待花費在調度延遲,IPC通信延遲等等


1.  前臺進程對LGWR發出調用,然後到sleep狀態下面看看Log file sync等待的整個流程:

此時log file sync等待開始記數

次調用在Unix平臺是通過信號量來實現

2.  LGWR被喚醒,得到CPU時間片來工作

LGWR發出IO請求

LGWR轉去sleep,並且等待log file parallel write

3.  當在存儲級別完成IO調用後OS喚醒LGWR進程

LGWR繼續去獲得CPU時間片

此時標記log file parallel write等待完成,Post相關信息給前臺進程

4.  前臺進程被LGWR喚醒,前臺進程得到CPU時間片並且標記log file sync等待完成

通過snapper腳本來度量LGWR的速度:

[html] view plaincopy
  1. <span style="font-family:'Comic Sans MS';">---------------------------------------------------------------------------------  
  2. SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH  
  3. ---------------------------------------------------------------------------------  
  4. 1096, (LGWR) , STAT, messages sent , 12 , 12,  
  5. 1096, (LGWR) , STAT, messages received , 10 , 10,  
  6. 1096, (LGWR) , STAT, background timeouts , 1 , 1,  
  7. 1096, (LGWR) , STAT, physical write total IO requests , 40, 40,  
  8. 1096, (LGWR) , STAT, physical write total multi block request, 38, 38,  
  9. 1096, (LGWR) , STAT, physical write total bytes, 2884608 , 2.88M,  
  10. 1096, (LGWR) , STAT, calls to kcmgcs , 20 , 20,  
  11. 1096, (LGWR) , STAT, redo wastage , 4548 , 4.55k,  
  12. 1096, (LGWR) , STAT, redo writes , 10 , 10,  
  13. 1096, (LGWR) , STAT, redo blocks written , 2817 , 2.82k,  
  14. 1096, (LGWR) , STAT, redo write time , 25 , 25,  
  15. 1096, (LGWR) , WAIT, LGWR wait on LNS , 1040575 , 1.04s, 104.1%, |@@@@@@@@@@|  
  16. 1096, (LGWR) , WAIT, log file parallel write , 273837 , 273.84ms, 27.4%,|@@@ |  
  17. 1096, (LGWR) , WAIT, events in waitclass Other , 1035172 , 1.04s , 103.5%,|@@@@@@@@@@|</span>  
 LGWR和Asynch IO 

[html] view plaincopy
  1. oracle@linux01:~$ strace -cp `pgrep -f lgwr`  
  2. Process 12457 attached - interrupt to quit  
  3. ^CProcess 12457 detached  
  4. % time seconds     usecs/call  calls     errors    syscall  
  5. ------ ----------- ----------- --------- --------- --------------  
  6. 100.00  0.010000    263        38        3          semtimedop  
  7. 0.00    0.000000    0          213                  times  
  8. 0.00    0.000000    0          8                    getrusage  
  9. 0.00    0.000000    0          701                  gettimeofday  
  10. 0.00    0.000000    0          41                   io_getevents  
  11. 0.00    0.000000    0          41                   io_submit  
  12. 0.00    0.000000    0          2                    semop  
  13. 0.00    0.000000    0          37                   semctl  
  14. ------ ----------- ----------- --------- --------- --------------  
  15. 100.00  0.010000               1081      3          total  
***io_getevents是在AIO階段log file parallel write等待事件度量
Redo,commit相關的latch tuning
1.redo allocation latches-故名思議,在私有現成寫redo到log buffer時保護分配空間的latch
2.redo copy latches-當從私有內存區域copy redo到log buffer時需要的latch直到相關redo流被copy到log buffer,,那麼LGWR進程
  直到已經copy完成可以寫buffers到磁盤,此時LGWR將等待LGWR wait for redo copy事件,相關的可以被調整的參數:_log_simultaneous_copies
等待事件:
1.log file sync
2.log file parallel write
3.log file single write
可以獲取相關的統計信息(v$sesstat,v$sysstat)
(1.redo size 2.redo writing time 3.user commits 4 user rollbacks 5.transaction rollbacks)
下面看一個非commit問題的等待事件:log buffer space-此事件主要是由於redo log buffer太小,LGWR刷出redo導致爭用,或者由於IO子系統太慢.根據很多人的經驗,相對log buffer設置大一點能夠緩解log file sync,這只是相對而言.如果你滴業務類型,每次commit都是比較大的寫入,而且系統的整個IO已經達到存儲子系統的瓶頸,那麼增大log buffer將是無濟於事的。根據MOS的很多文檔參考,在10g中還是不建議設置次參數。
log file single write:
單塊寫redo IO大多數情況下僅僅用於logfile header block的讀和寫,其中log switch是主要的情況,當歸檔發生時需要update log header,所以可能是LGWR和ARCH等待此事件。
如下是log switch發生時的trace文件:
[html] view plaincopy
  1. WAIT #0: nam='log file sequential read' ela12607 log#=0  
  2. block#=1  
  3. WAIT #0: nam='log file sequential read' ela21225 log#=1  
  4. block#=1  
  5. WAIT #0: nam='control file sequential read' ela358 file#=0  
  6. WAIT #0: nam='log file single write' ela470 log#=0 block#=1  
  7. WAIT #0: nam='log file single write' ela227 log#=1 block#=1  
從10.2.0.3+開始如果log write等待太久,oracle將dump相關的信息:
[html] view plaincopy
  1. LGWR trace file:  
  2. *** 2012-06-10 11:36:06.759  
  3. Warning: log write time 690ms, size 19KB  
  4. *** 2012-06-10 11:37:23.778  
  5. Warning: log write time 52710ms, size 0KB  
  6. *** 2012-06-10 11:37:27.302  
  7. Warning: log write time 3520ms, size 144KB  

看下面我們某庫的AWR信息:

[html] view plaincopy
  1. Load Profile              Per Second    Per Transaction   Per Exec   Per Call  
  2. ~~~~~~~~~~~~         ---------------    --------------- ---------- ----------  
  3.        Redo size:       15,875,849.0          121,482.8  
  4.    Logical reads:           42,403.5              324.5  
  5.    Block changes:           34,759.1              266.0  
  6.   Physical reads:               46.0                0.4  
  7.  Physical writes:            3,417.9               26.2  
  8.       User calls:              569.6                4.4  
  9.           Parses:              292.3                2.2  
  10.      Hard parses:                0.1                0.0  
  11. W/A MB processed:                0.5                0.0  
  12.           Logons:               10.7                0.1  
  13.         Executes:              552.8                4.2  
  14.        Rollbacks:               42.9                0.3  
  15.     Transactions:              130.7  
[html] view plaincopy
  1. Top 5 Timed Foreground Events  
  2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  3.                                                            Avg  
  4.                                                           wait   % DB  
  5. Event                                 Waits     Time(s)   (ms)   time Wait Class  
  6. ------------------------------ ------------ ----------- ------ ------ ----------  
  7. DB CPU                                           37,301          76.5  
  8. log file sync                     1,665,900       7,732      5   15.9 Commit  
  9. db file sequential read             711,221       6,614      9   13.6 User I/O  
  10. buffer busy waits                   366,589         440      1     .9 Concurrenc  
  11. gc current multi block request      192,791         230      1     .5 Cluster  

這是某庫的2號節點,還不算太忙,只是由於業務類型每次寫入的量都很大,log file sync等待佔用DB time的16%

看看後臺進程等待:

[html] view plaincopy
  1. <span style="font-family:Comic Sans MS;">                                                             Avg  
  2.                                         %Time Total Wait    wait    Waits   % bg  
  3. Event                             Waits -outs   Time (s)    (ms)     /txn   time  
  4. -------------------------- ------------ ----- ---------- ------- -------- ------  
  5. db file parallel write       11,968,325     0     24,481       2      5.7   71.2  
  6. log file parallel write       1,503,192     0      3,863       3      0.7   11.2</span>  

***如上信息log file sync佔用了DB time的16%,avg wait(5),那麼LGWR等待佔用的比例爲多少呢?佔用了平均每次相應的40%.那麼這主要是因爲業務原因,已經達到我們存儲的IO瓶頸.

***此庫平均每s的吞吐量在200M左右

下面看看我使用腳本lfsdiag.sql收集的部分信息:

[html] view plaincopy
  1.    INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT  
  2. ---------- ---------------------------------------- --------------- ----------  
  3.          1 wait for scn ack                                       1    4243024  
  4.          1 wait for scn ack                                       2     728196  
  5.          1 wait for scn ack                                       4    1133400  
  6.          1 wait for scn ack                                       8    1157120  
  7.          1 wait for scn ack                                      16      88333  
  8.          1 wait for scn ack                                      32       3883  
  9.          1 wait for scn ack                                      64        429  
  10.          1 wait for scn ack                                     128         80  
  11.          1 wait for scn ack                                     256         34  
  12.          1 wait for scn ack                                     512         48  
  13.          2 wait for scn ack                                       1   55024800  
  14.          2 wait for scn ack                                       2    6658764  
  15.          2 wait for scn ack                                       4    6802492  
  16.          2 wait for scn ack                                       8    4400949  
  17.          2 wait for scn ack                                      16     564950  
  18.          2 wait for scn ack                                      32      21712  
  19.          2 wait for scn ack                                      64       3190  
  20.          2 wait for scn ack                                     128        912  
  21.          2 wait for scn ack                                     256        390  
  22.          2 wait for scn ack                                     512        508  
  23.          1 log file sync                                          1   49708644  
  24.          1 log file sync                                          2    4285471  
  25.          1 log file sync                                          4    3929029  
  26.          1 log file sync                                          8    2273533  
  27.          1 log file sync                                         16     709349  
  28.          1 log file sync                                         32     257827  
  29.          1 log file sync                                         64      10464  
  30.          1 log file sync                                        128       2371  
  31.          1 log file sync                                        256       1582  
  32.          1 log file sync                                        512       1979  
  33.          1 log file sync                                       1024       1200  
  34.          2 log file sync                                          1  647580137  
  35.          2 log file sync                                          2   56421028  
  36.          2 log file sync                                          4   42559988  
  37.          2 log file sync                                          8   26002340  
  38.          2 log file sync                                         16   12821558  
  39.          2 log file sync                                         32    4429073  
  40.          2 log file sync                                         64     229397  
  41.          2 log file sync                                        128      42685  
  42.          2 log file sync                                        256      22693  
  43.          2 log file sync                                        512      23922  
  44.          2 log file sync                                       1024     215090  
  45.          1 log file switch completion                             1        141  
  46.          1 log file switch completion                             2         27  
  47.          1 log file switch completion                             4         35  
  48.          1 log file switch completion                             8         72  
  49.          1 log file switch completion                            16        237  
  50.          1 log file switch completion                            32        453  
  51.          1 log file switch completion                            64        387  
  52.          1 log file switch completion                           128         31  
  53.          2 log file switch completion                             1        956  
  54.          2 log file switch completion                             2        508  
  55.          2 log file switch completion                             4       1005  
  56.          2 log file switch completion                             8       1858  
  57.          2 log file switch completion                            16       4506  
  58.          2 log file switch completion                            32       5569  
  59.          2 log file switch completion                            64       6957  
  60.          2 log file switch completion                           128        319  
  61.          2 log file switch completion                           256         24  
  62.          2 log file switch completion                           512        108  
  63.          2 log file switch completion                          1024          1  
  64.          1 log file parallel write                                1   56713575  
  65.          1 log file parallel write                                2    2952904  
  66.          1 log file parallel write                                4    1832942  
  67.          1 log file parallel write                                8     785097  
  68.          1 log file parallel write                               16     386755  
  69.          1 log file parallel write                               32     229099  
  70.          1 log file parallel write                               64       8552  
  71.          1 log file parallel write                              128       1461  
  72.          1 log file parallel write                              256        914  
  73.          1 log file parallel write                              512        231  
  74.          1 log file parallel write                             1024         21  
  75.          1 log file parallel write                             2048          3  
  76.          2 log file parallel write                                1  708078642  
  77.          2 log file parallel write                                2   31616460  
  78.          2 log file parallel write                                4   16087368  
  79.          2 log file parallel write                                8    5656461  
  80.          2 log file parallel write                               16    3121042  
  81.          2 log file parallel write                               32    1995505  
  82.          2 log file parallel write                               64      44298  
  83.          2 log file parallel write                              128       7506  
  84.          2 log file parallel write                              256       2582  
  85.          2 log file parallel write                              512        536  
  86.          2 log file parallel write                             1024        464  
  87.          2 log file parallel write                             2048         26  
  88.          2 log file parallel write                             4096          0  
  89.          2 log file parallel write                             8192          0  
  90.          2 log file parallel write                            16384          0  
  91.          2 log file parallel write                            32768          0  
  92.          2 log file parallel write                            65536          0  
  93.          2 log file parallel write                           131072          0  
  94.          2 log file parallel write                           262144          0  
  95.          2 log file parallel write                           524288          1  
  96.          1 gcs log flush sync                                     1    4366103  
  97.          1 gcs log flush sync                                     2      72108  
  98.          1 gcs log flush sync                                     4      52374  
  99.          1 gcs log flush sync                                     8      23374  
  100.   
  101.    INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT  
  102. ---------- ---------------------------------------- --------------- ----------  
  103.          1 gcs log flush sync                                    16      13518  
  104.          1 gcs log flush sync                                    32      12450  
  105.          1 gcs log flush sync                                    64      11307  
  106.          1 gcs log flush sync                                   128          4  
  107.          2 gcs log flush sync                                     1    9495464  
  108.          2 gcs log flush sync                                     2     263718  
  109.          2 gcs log flush sync                                     4     222876  
  110.          2 gcs log flush sync                                     8     148562  
  111.          2 gcs log flush sync                                    16      68586  
  112.          2 gcs log flush sync                                    32      33704  
  113.          2 gcs log flush sync                                    64       5231  
  114.          2 gcs log flush sync                                   128          1  
  115.          1 gc current block 2-way                                 1   30064919  
  116.          1 gc current block 2-way                                 2     353563  
  117.          1 gc current block 2-way                                 4     239425  
  118.          1 gc current block 2-way                                 8      29994  
  119.          1 gc current block 2-way                                16       3203  
  120.          1 gc current block 2-way                                32       1661  
  121.          1 gc current block 2-way                                64       1501  
  122.          1 gc current block 2-way                               128        273  
  123.          1 gc current block 2-way                               256        153  
  124.          1 gc current block 2-way                               512         22  
  125.          1 gc current block 2-way                              1024        119  
  126.          2 gc current block 2-way                                 1   36168617  
  127.          2 gc current block 2-way                                 2     303236  
  128.          2 gc current block 2-way                                 4     148934  
  129.          2 gc current block 2-way                                 8      13304  
  130.          2 gc current block 2-way                                16       2140  
  131.          2 gc current block 2-way                                32       1635  
  132.          2 gc current block 2-way                                64       1114  
  133.          2 gc current block 2-way                               128        210  
  134.          2 gc current block 2-way                               256         28  
  135.          2 gc current block 2-way                               512         12  
  136.          2 gc current block 2-way                              1024         12  
  137.          2 gc current block 2-way                              2048          3  
  138.          2 gc current block 2-way                              4096          2  
  139.          1 gc cr grant 2-way                                      1   76502000  
  140.          1 gc cr grant 2-way                                      2     476023  
  141.          1 gc cr grant 2-way                                      4     564802  
  142.          1 gc cr grant 2-way                                      8      61560  
  143.          1 gc cr grant 2-way                                     16       5657  
  144.          1 gc cr grant 2-way                                     32       3011  
  145.          1 gc cr grant 2-way                                     64        440  
  146.          1 gc cr grant 2-way                                    128        217  
  147.          1 gc cr grant 2-way                                    256          6  
  148.          2 gc cr grant 2-way                                      1  155966394  
  149.          2 gc cr grant 2-way                                      2     740788  
  150.          2 gc cr grant 2-way                                      4     748834  
  151.          2 gc cr grant 2-way                                      8      59464  
  152.          2 gc cr grant 2-way                                     16       9889  
  153.          2 gc cr grant 2-way                                     32       7236  
  154.          2 gc cr grant 2-way                                     64        937  
  155.          2 gc cr grant 2-way                                    128        463  
  156.          2 gc cr grant 2-way                                    256         14  
  157.          2 gc cr grant 2-way                                    512          5  
  158.          2 gc cr grant 2-way                                   1024         10  
  159.          2 gc cr grant 2-way                                   2048          2  
  160.          2 gc cr grant 2-way                                   4096          4  
  161.          2 gc cr grant 2-way                                   8192          1  
  162.          1 gc buffer busy                                         1   34252868  
  163.          1 gc buffer busy                                         2   18723990  
  164.          1 gc buffer busy                                         4    9528539  
  165.          1 gc buffer busy                                         8    4351426  
  166.          1 gc buffer busy                                        16    3691918  
  167.          1 gc buffer busy                                        32     755331  
  168.          1 gc buffer busy                                        64      68712  
  169.          1 gc buffer busy                                       128      10869  
  170.          1 gc buffer busy                                       256       2553  
  171.          1 gc buffer busy                                       512        337  
  172.          1 gc buffer busy                                      1024       2933  
  173.          2 gc buffer busy                                         1    7881434  
  174.          2 gc buffer busy                                         2    2083189  
  175.          2 gc buffer busy                                         4    1372486  
  176.          2 gc buffer busy                                         8    1957290  
  177.          2 gc buffer busy                                        16    1417604  
  178.          2 gc buffer busy                                        32     448992  
  179.          2 gc buffer busy                                        64     544446  
  180.          2 gc buffer busy                                       128     202888  
  181.          2 gc buffer busy                                       256      58584  
  182.          2 gc buffer busy                                       512      16470  
  183.          2 gc buffer busy                                      1024      91266  
  184.          2 gc buffer busy                                      2048         14  
  185.          1 LGWR wait for redo copy                                1     278115  
  186.          1 LGWR wait for redo copy                                2       3698  
  187.          1 LGWR wait for redo copy                                4       8498  
  188.          1 LGWR wait for redo copy                                8        220  
  189.          1 LGWR wait for redo copy                               16          6  
  190.          1 LGWR wait for redo copy                               32          1  
  191.          2 LGWR wait for redo copy                                1    7935371  
  192.          2 LGWR wait for redo copy                                2      29915  
  193.          2 LGWR wait for redo copy                                4      58179  
  194.          2 LGWR wait for redo copy                                8       2472  
  195.          2 LGWR wait for redo copy                               16        204  
  196.          2 LGWR wait for redo copy                               32         47  

此信息主要來自於V$EVENT_HISTOGRAM,對於判斷到底是什麼引起的問題,有助於參考。

***Oracle有個bug:log file sync等待1s,cursor:pin s wait on x也有個bug,莫名等待10ms.

如何tuning sql log file:

對於log buffer size的大小,前面已經提到過,默認的就已經足夠大了,但是從經驗來看(主要是前輩們,我也嘗試過,效果不太明顯),所以在allocate相關latch等待,比較多的時候可以考慮增大log buffer size.9.2以後oracle使用多個log buffer,也在很大程度上緩解了相關latch等待(每個latch保護自己的buffers).10g出現新功能:private redo strand,每一個allocate latch保護自己的redo strand,也出現了IMU的概念,所以log buffer相關latch的爭用已經很少出現.

下面看看相關的參數調整:

10g R1:commit_logging

10g R2:commit_write

Option Effect
Wait(default)

Ensures that the commit returns only after the corresponding redo information is persistent in the online redo log. When the client receives a successful return from this COMMIT statement, the transaction has been committed to durable media.

A failure that occurs after a successful write to the log might prevent the success message from returning to the client, in which case the client cannot tell whether or not the transaction committed.

Nowait

The commit returns to the client whether or not the write to the redo log has completed. This behavior can increase transaction throughput.

Batch

The redo information is buffered to the redo log, along with other concurrently executing transactions. When sufficient redo information is collected, a disk write to the redo log is initiated. This behavior is called group commit, as redo information for multiple transactions is written to the log in a single I/O operation.

Immediate(default) LGWR writes the transaction's redo information to the log. Because this operation option forces a disk I/O, it can reduce transaction throughput.

以上來自於Online Document,如果你不關心ACID的D(持久性),也就是不關心instance crash後丟失數據的風險,完全可以採用nowait,但我目前沒有見過系統使用該參數,都爲默認值.

可能在很多情況下,我們從單純的DB的層面去tuning,並不能得到很好的效果,從應用層面能夠得到很好的效果,但是推動應用修改代碼又是一件難而又難的事情。

1.PLSQL,這包括Procedure導致的log file sync,看下面例子:

[html] view plaincopy
  1. SW_SID  STATE     SW_EVENT       SEQ#     SEC_IN_WAIT   SQL_ID        SQL_TEXT  
  2. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
  3. 2962     WAITING    log file sync    60440    0         773b3nqmcvwf4    BEGIN P_MS_UPDATE_SENDTASK       (:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17); END;  
  4. 2962     WAITING    log file sync    60440    0         773b3nqmcvwf4    BEGIN P_MS_UPDATE_SENDTASK(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16,:17); END;  

對於這種類型的存過,裏面有各種update,insert,delete,每次的處理量比較大,所以我們只能去修改,分散相應的業務邏輯.是每次提交儘可能以合理的批量來做

CPU方面:

也有種可能是在CPU的配置上來優化,LGWR消耗大量的CPU,做法是如果LGWR等待的延遲相當嚴重,那麼可以把LGWR調整到高優先級

IO方面:

如果你的存儲IO存在瓶頸,那麼log file parallel write事件會比較明顯,所以這個調整還是從存儲級別,比如採用raw device,ASM,更加快速的存儲設備等等

下面是如上Begin....End執行的系統的log file sync曲線,當此過程大量執行的時候,平均等待的時間有所增加,如下圖:

未完待續……

發佈了6 篇原創文章 · 獲贊 4 · 訪問量 18萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章