你所不知道的Oracle後臺進程SMON功能

原帖地址:點擊打開鏈接

SMON(system monitor process)系統監控後臺進程,有時候也被叫做system cleanup process,這麼叫的原因是它負責完成很多清理(cleanup)任務。但凡學習過Oracle基礎知識的技術人員都會或多或少對該background process的功能有所瞭解。

我們所熟知的SMON是個兢兢業業的傢伙,它負責完成一些列系統級別的任務。與PMON(Process Monitor)後臺進程不同的是,SMON負責完成更多和整體系統相關的工作,這導致它會去做一些不知名的”累活”,當系統頻繁產生這些”垃圾任務”,則SMON可能忙不過來。因此在10gSMON變得有一點懶惰了,如果它在短期內接收到過多的工作通知(SMON: system monitor process posted),那麼它可能選擇消極怠工以便讓自己不要過於繁忙(SMON: Posted too frequently, trans recovery disabled),之後會詳細介紹。

瞭解你所不知道的SMON功能():清理臨時段

觸發場景

很多人錯誤地理解了這裏所說的臨時段temporary segments,認爲temporary segments是指temporary tablespace臨時表空間上的排序臨時段(sort segment)。事實上這裏的臨時段主要指的是永久表空間(permanent tablespace)上的臨時段,當然臨時表空間上的temporary segments也是由SMON來清理(cleanup)的,但這種清理僅發生在數據庫實例啓動時(instance startup)

永久表空間上同樣存在臨時段,譬如當我們在某個永久表空間上使用create table/indexDDL命令創建某個表/索引時,服務進程一開始會在指定的永久表空間上分配足夠多的區間(Extents),這些區間在命令結束之前都是臨時的(Temporary Extents),直到表/索引完全建成纔將該temporary segment轉換爲permanent segment。另外當使用drop命令刪除某個段時,也會先將該段率先轉換爲temporary segment,之後再來清理該temporary segment(DROP object converts the segment to temporary and then cleans up the temporary segment)。 常規情況下清理工作遵循誰創建temporary segment,誰負責清理的原則。換句話說,因服務進程rebuild index所產生的temporary segmentrebuild完成後應由服務進程自���負責清理。一旦服務進程在成功清理temporary segment之前就意外終止了,亦或者服務進程在工作過程中遇到了某些ORA-錯誤導致語句失敗,那麼SMON都會被要求(posted)負責完成temporary segment的清理工作。

對於永久表空間上的temporary segmentSMON會三分鐘清理一次(前提是接到post),如果SMON過於繁忙那麼可能temporary segment長期不被清理。temporary segment長期不被清理可能造成一個典型的問題是:rebuild index online失敗後,後續執行的rebuild index命令要求之前產生的temporary segment已被cleanup,如果cleanup沒有完成那麼就需要一直等下去。在10gR2中我們可以使用dbms_repair.online_index_clean來手動清理online index rebuild的遺留問題:

The dbms_repair.online_index_clean function has been created to cleanup online index rebuilds.

Use the dbms_repair.online_index_clean function to resolve the issue.

Please note if you are unable to run the dbms_repair.online_index_clean function it is due to the fact

that you have not installed the patch for Bug 3805539 or are not running on a release that includes this fix.

The fix for this bug is a new function in the dbms_repair package called dbms_repair.online_index_clean,

which has been created to cleanup online index [[sub]partition] [re]builds.

New functionality is not allowed in patchsets;

therefore, this is not available in a patchset but is available in 10gR2.

Check your patch list to verify the database is patched for Bug 3805539

using the following command and patch for the bug if it is not listed:

opatch lsinventory -detail

Cleanup after a failed online index [re]build can be slow to occurpreventing subsequent such operations

until the cleanup has occured.

接着我們通過實踐來看一下smon是如何清理永久表空間上的temporary segment:

設置10500事件以跟蹤smon進程,這個診斷事件後面會介紹

SQL> alter system set events '10500 trace name context forever,level 10';

System altered.

在第一個會話中執行create table命令,這將產生一定量的Temorary Extents

SQL> create table smon as select * from ymon;

在另一個會話中執行對DBA_EXTENTS視圖的查詢,可以發現產生了多少臨時區間

SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

COUNT(*)

----------

117

終止以上create tablesession,等待一段時間後觀察smon後臺進程的trc可以發現以下信息:

*** 2011-06-07 21:18:39.817

SMON: system monitor process posted msgflag:0x0200 (-/-/-/-/TMPSDROP/-/-)

*** 2011-06-07 21:18:39.818

SMON: Posted, but not for trans recovery, so skip it.

*** 2011-06-07 21:18:39.818

SMON: clean up temp segments in slave

SQL> SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

COUNT(*)

----------

0

可以看到smon通過slave進程完成了對temporary segment的清理

與永久表空間上的臨時段不同,出於性能的考慮臨時表空間上的Extents並不在操作(operations)完成後立即被釋放和歸還。相反,這些Temporary Extents會被標記爲可用,以便用於下一次的排序操作。SMON仍會清理這些Temporary segments,但這種清理僅發生在實例啓動時(instance startup):

For performance issues, extents in TEMPORARY tablespaces are not released ordeallocated

once the operation is complete.Instead, the extent is simply marked as available for the next sort operation.

SMON cleans up the segments at startup.

A sort segment is created by the first statement that used a TEMPORARY tablespacefor sorting, after startup.

A sort segment created in a TEMPOARY tablespace is only released at shutdown.

The large number of EXTENTS is caused when the STORAGE clause has been incorrectly calculated.

現象

可以通過以下查詢瞭解數據庫中Temporary Extent的總數,在一定時間內比較其總數,若有所減少那麼說明SMON正在清理Temporary segment

SELECT COUNT(*) FROM DBA_EXTENTS WHERE SEGMENT_TYPE='TEMPORARY';

也可以通過v$sysstat視圖中的”SMON posted for dropping temp segment”事件統計信息來了解SMON收到清理要求的情況:

SQL> select name,value from v$sysstat where name like '%SMON%';

NAME                                                                  VALUE

---------------------------------------------------------------- ----------

total number of times SMON posted                                         8

SMON posted for undo segment recovery                                     0

SMON posted for txn recovery for other instances                          0

SMON posted for instance recovery                                         0

SMON posted for undo segment shrink                                       0

SMON posted for dropping temp segment                                     1

另外在清理過程中SMON會長期持有Space Transacton(ST)隊列鎖,其他會話可能因爲得不到ST鎖而等待超時出現ORA-01575錯誤:

01575, 00000, "timeout waiting for space management resource"

// *Cause: failed to acquire necessary resource to do space management.

// *Action: Retry the operation.

如何禁止SMON清理臨時段

可以通過設置診斷事件event=’10061 trace name context forever, level 10′禁用SMON清理臨時段(disable SMON from cleaning temp segments)

alter system set events '10061 trace name context forever, level 10';

相關診斷事件

除去10061事件外還可以用10500事件來跟蹤smonpost信息,具體的事件設置方法見<EVENT: 10500 “turn on traces for SMON>

瞭解你所不知道的SMON功能():合併空閒區間

SMON的作用還包括合併空閒區間(coalesces free extent)

觸發場景

早期Oracle採用DMT字典管理表空間,不同於今時今日的LMT本地管理方式,DMT下通過對FET$UET$2張字典基表的遞歸操作來管理區間。SMON5分鐘(SMON wakes itself every 5 minutes and checks for tablespaces with default pctincrease != 0)會自發地去檢查哪些默認存儲參數pctincrease不等於0的字典管理表空間,注意這種清理工作是針對DMT的,而LMT則無需合併。SMON對這些DMT表空間上的連續相鄰的空閒Extents實施coalesce操作以合併成一個更大的空閒Extent,這同時也意味着SMON需要維護FET$字典基表。

現象

以下查詢可以檢查數據庫中空閒Extents的總數,如果這個總數在持續減少那麼說明SMON正在coalesce free space

SELECT COUNT(*) FROM DBA_FREE_SPACE;

在合併區間時SMON需要排他地(exclusive)持有ST(Space Transaction)隊列鎖, 其他會話可能因爲得不到ST鎖而等待超時出現ORA-01575錯誤。同時SMON可能在繁瑣的coalesce操作中消耗100%CPU

如何禁止SMON合併空閒區間

可以通過設置診斷事件event=’10269 trace name context forever, level 10′來禁用SMON合併空閒區間(Don’t do coalesces of free space in SMON)

10269, 00000, "Don't do coalesces of free space in SMON"
// *Cause:    setting this event prevents SMON from doing free space coalesces
alter system set events '10269 trace name context forever, level 10';

瞭解你所不知道的SMON功能():清理obj$基表

SMON的作用還包括清理obj$數據字典基表(cleanup obj$)

OBJ$字典基表是Oracle Bootstarp啓動自舉的重要對象之一:

SQL> set linesize 80 ;
SQL> select sql_text from bootstrap$ where sql_text like 'CREATE TABLE OBJ$%';
SQL_TEXT
--------------------------------------------------------------------------------
CREATE TABLE OBJ$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"OWNER#" NUMBER NOT N
ULL,"NAME" VARCHAR2(30) NOT NULL,"NAMESPACE" NUMBER NOT NULL,"SUBNAME" VARCHAR2(
30),"TYPE#" NUMBER NOT NULL,"CTIME" DATE NOT NULL,"MTIME" DATE NOT NULL,"STIME"
DATE NOT NULL,"STATUS" NUMBER NOT NULL,"REMOTEOWNER" VARCHAR2(30),"LINKNAME" VAR
CHAR2(128),"FLAGS" NUMBER,"OID$" RAW(16),"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3
" NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE
10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE (  INITIAL 16K NEXT 1024K MINEXTEN
TS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 18 EXTENTS (FILE 1 BLOCK 121))

觸發場景

OBJ$基表是一張低級數據字典表,該���幾乎對庫中的每個對象(表、索引、包、視圖等)都包含有一行記錄。很多情況下,這些條目所代表的對象是不存在的對象(non-existent),引起這種現象的一種可能的原因是對象本身已經被從數據庫中刪除了,但是對象條目仍被保留下來以滿足消極依賴機制(negative dependency)。因爲這些條目的存在會導致OBJ$表不斷膨脹,這時就需要由SMON進程來刪除這些不再需要的行。SMON會在實例啓動(after startup of DB is started cleanup function again)時以及啓動後的每12個小時執行一次清理任務(the cleanup is scheduled to run after startup and then every 12 hours)

我們可以通過以下演示來了解SMON清理obj$的過程:

SQL>  BEGIN
  2      FOR i IN 1 .. 5000 LOOP
  3      execute immediate ('create synonym gustav' || i || ' for
  4  perfstat.sometable');
  5      execute immediate ('drop   synonym gustav' || i );
  6      END LOOP;
  7    END;
  8    /
PL/SQL procedure successfully completed.
SQL> startup force;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.
Database opened.
SQL>   select count(*) from user$ u, obj$ o
  2        where u.user# (+)=o.owner# and o.type#=10 and not exists
  3        (select p_obj# from dependency$ where p_obj# = o.obj#);
  COUNT(*)
----------
      5000
SQL> /
  COUNT(*)
----------
      5000
SQL> /
  COUNT(*)
----------
      4951
SQL> oradebug setospid 18457;
Oracle pid: 8, Unix process pid: 18457, image: [email protected] (SMON)
SQL> oradebug event 10046 trace name context forever ,level 1;
Statement processed.
SQL> oradebug tracefile_name;
/s01/admin/G10R2/bdump/g10r2_smon_18457.trc
select o.owner#,
       o.obj#,
       decode(o.linkname,
              null,
              decode(u.name, null, 'SYS', u.name),
              o.remoteowner),
       o.name,
       o.linkname,
       o.namespace,
       o.subname
  from user$ u, obj$ o
 where u.use r#(+) = o.owner#
   and o.type# = :1
   and not exists
 (select p_obj# from dependency$ where p_obj# = o.obj#)
 order by o.obj#
   for update
select null
  from obj$
 where obj# = :1
   and type# = :2
   and obj# not in
       (select p_obj# from dependency$ where p_obj# = obj$.obj#)
delete from obj$ where obj# = :1
/* 刪除過程其實較爲複雜,可能要刪除多個字典基表上的記錄 */

現象

我們可以通過以下查詢來了解obj$基表中NON-EXISTENT對象的條目總數(type#=10),若這個總數在不斷減少說明smon正在執行清理工作

    select trunc(mtime), substr(name, 1, 3) name, count(*)
      from obj$
     where type# = 10
       and not exists (select * from dependency$ where obj# = p_obj#)
     group by trunc(mtime), substr(name, 1, 3);
      select count(*)
        from user$ u, obj$ o
       where u.user#(+) = o.owner#
         and o.type# = 10
         and not exists
       (select p_obj# from dependency$ where p_obj# = o.obj#);

如何禁止SMON清理obj$基表

我們可以通過設置診斷事件event=’10052 trace name context forever’來禁止SMON清理obj$基表,當我們���要避免SMONcleanup obj$的相關代碼而意外終止或spin從而開展進一步的診斷時可以設置該診斷事件。在Oracle並行服務器或RAC環境中,也可以設置該事件來保證只有特定的某個節點來執行清理工作。

10052, 00000, "don't clean up obj$"
alter system set events '10052 trace name context forever, level 65535';
Problem Description: We are receiving the below warning during db startup:
WARNING: kqlclo() has detected the following :
Non-existent object 37336 NOT deleted because an object
of the same name exists already.
Object name: PUBLIC.USER$
This is caused by the SMON trying to cleanup the SYS.OJB$.
SMON cleans all dropped objects which have a SYS.OBJ$.TYPE#=10. 
This can happen very often when you create an object that have the same name as a public synonym. 
When SMON is trying to remove non-existent objects and fails because there are duplicates, 
multiple nonexistent objects with same name.
This query will returned many objects with same name under SYS schema:
select o.name,u.user# from user$ u, obj$ o where u.user# (+)=o.owner# and o.type#=10 
and not exists (select p_obj# from dependency$ where p_obj# = o.obj#);
To cleanup this message:
Take a full backup of the database - this is crucial. If anything goes wrong during this procedure, 
your only option would be to restore from backup, so make sure you have a good backup before proceeding. 
We suggest a COLD backup. If you plan to use a HOT backup, you will have to restore point in time if any problem happens
Normally DML against dictionary objects is unsupported, 
but in this case we know exactly what the type of corruption, 
also you are instructing to do this under guidance from Support.
Data dictionary patching must be done by an experienced DBA. 
This solution is unsupported. 
It means that if there were problems after applying this solution, a database backup must be restored.
1. Set event 10052 at parameter file to disable cleanup of OBJ$ by SMON
EVENT="10052 trace name context forever, level 65535"
2. Startup database in restricted mode
3. Delete from OBJ$, COMMIT
SQL> delete from obj$ where (name,owner#) in ( select o.name,u.user# from user$ u, obj$ o
where u.user# (+)=o.owner# and o.type#=10 and not exists (select p_obj# from
dependency$ where p_obj# = o.obj#) );
SQL> commit;
SQL> Shutdown abort.
4. remove event 10052 from init.ora
5. Restart the database and monitor for the message in the ALERT LOG file

瞭解你所不知道的SMON功能():維護col_usage$字典基表

SMON的作用還包括維護col_usage$列監控統計信息基表。

最��在9i中引入了col_usage$字典基表,其目的在於監控columnSQL語句作爲predicate的情況,col_usage$的出現完善了CBO中柱狀圖自動收集的機制。

create table col_usage$
(
  obj#              number,                                 /* object number */
  intcol#           number,                        /* internal column number */
  equality_preds    number,                           /* equality predicates */
  equijoin_preds    number,                           /* equijoin predicates */
  nonequijoin_preds number,                        /* nonequijoin predicates */
  range_preds       number,                              /* range predicates */
  like_preds        number,                         /* (not) like predicates */
  null_preds        number,                         /* (not) null predicates */
  timestamp         date      /* timestamp of last time this row was changed */
)
  storage (initial 200K next 100k maxextents unlimited pctincrease 0)
/
create unique index i_col_usage$ on col_usage$(obj#,intcol#)
  storage (maxextents unlimited)
/

10g中我們默認使用FOR ALL COLUMNS SIZE AUTO’的柱狀圖收集模式,而在9i中默認是SIZE 1′即默認不收集柱狀圖,這導致許多9i中正常運行的應用程序在10gCBO執行計劃異常,詳見<dbms_stats收集模式在9i10g上的區別>;SIZE AUTO’意爲由Oracle自動決定是否收集柱狀圖及柱狀圖的桶數,Oracle自行判斷的依據就來源於col_usage$字典基表,若表上的某一列曾在硬解析(hard parse)過的SQL語句中充當過predicate(通俗的說就是where後的condition)的話,我們認爲此列上有收集柱狀圖的必要,那麼col_usage$上就會被加入該列曾充當predicate的記錄。當DBMS_STATS.GATHER_TABLE_STATS存儲過程以SIZE AUTO’模式執行時,收集進程會檢查col_usage$基表以判斷哪些列之前曾充當過predicate,若充當過則說明該列有收集柱狀圖的價值。

SMON會每15分鐘將shared pool中的predicate columns的數據刷新到col_usage$基表中(until periodically about every 15 minutes SMON flush the data into the data dictionary),另外當instance shutdownSMON會掃描col_usage$並找出已被drop表的相關predicate columns記錄,並刪除這部分orphaned”孤兒記錄。

我們來具體瞭解col_usage$的填充過程:

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL> create table maclean (t1 int);
Table created.
SQL> select object_id from dba_objects where object_name='MACLEAN';
 OBJECT_ID
----------
   1323013
SQL> select * from maclean where t1=1;
no rows selected
SQL> set linesize 200 pagesize 2000;
注意col_usage$的數據同*_tab_modifications類似,
從查詢到數據刷新到col_usage$存在一段時間的延遲,
所以我們立即查詢col_usage$將得不到任何記錄,
可以手動執行DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO將緩存中的信息刷新到字典上
SQL> select * from col_usage$ where obj#=1323013;
no rows selected
SQL> oradebug setmypid;
Statement processed.
針對FLUSH_DATABASE_MONITORING_INFO填充操作做10046 level 12 trace
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
PL/SQL procedure successfully completed.
SQL> select * from col_usage$ where obj#=1323013;
      OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
   1323013          1              1              0                 0           0          0          0 19-AUG-11
=============10046 trace content====================
lock table sys.col_usage$ in exclusive mode nowait
在測試中可以發現10.2.0.4DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO存儲過程會優先使用
lock in exclusive mode nowait來鎖住col_usage$基表,
如果lock失敗則會反覆嘗試1100次,
若仍不能鎖住col_usage$表則放棄更新col_usage$上的數據,避免造成鎖等待和死鎖。
Cksxm.c
Monitor Modification Hash Table Base
modification hash table entry
modification hash table chunk
monitoring column usage element
ksxmlock_1
lock table sys.col_usage$ in exclusive mode
lock table sys.col_usage$ in exclusive mode nowait
update sys.col_usage$
   set equality_preds    = equality_preds +
                           decode(bitand(:flag, 1), 0, 0, 1),
       equijoin_preds    = equijoin_preds +
                           decode(bitand(:flag, 2), 0, 0, 1),
       nonequijoin_preds = nonequijoin_preds +
                           decode(bitand(:flag, 4), 0, 0, 1),
       range_preds       = range_preds + decode(bitand(:flag, 8), 0, 0, 1),
       like_preds        = like_preds + decode(bitand(:flag, 16), 0, 0, 1),
       null_preds        = null_preds + decode(bitand(:flag, 32), 0, 0, 1),
       timestamp         = :time
 where obj# = :ob jn
   and intcol# = :coln
insert into sys.col_usage$
  (obj#,
   intcol#,
   equality_preds,
   equijoin_preds,
   nonequijoin_preds,
   range_preds,
   like_preds,
   null_preds,
   timestamp)
values
  (:objn,
   :coln,
   decode(bitand(:flag, 1), 0, 0, 1),
   decode(bitand(:flag, 2), 0, 0, 1),
   decode(bitand(:flag, 4), 0, 0, 1),
   decode(bitand(:flag, 8), 0, 0, 1),
   decode(bitand(:flag, 16), 0, 0, 1),
   decode(bitand(:flag, 32), 0, 0, 1),
   :time)

使用dbms_statsSIZE AUTO’模式收集表上的統計信息會首先參考col_usage$中的predicate columns記錄:

SQL> begin
  2
  3    dbms_stats.gather_table_stats(ownname    => 'SYS',
  4                                  tabname    => 'MACLEAN',
  5                                  method_opt => 'FOR ALL COLUMNS SIZE AUTO');
  6  end;
  7  /
PL/SQL procedure successfully completed.
============10046 level 12 trace content======================
SELECT /*+ ordered use_nl(o c cu h) index(u i_user1) index(o i_obj2)
               index(ci_obj#) index(cu i_col_usage$)
               index(h i_hh_obj#_intcol#) */
 C.NAME COL_NAME,
 C.TYPE# COL_TYPE,
 C.CHARSETFORM COL_CSF,
 C.DEFAULT$ COL_DEF,
 C.NULL$ COL_NULL,
 C.PROPERTY COL_PROP,
 C.COL # COL_UNUM,
 C.INTCOL# COL_INUM,
 C.OBJ# COL_OBJ,
 C.SCALE COL_SCALE,
 H.BUCKET_CNT H_BCNT,
 (T.ROWCNT - H.NULL_CNT) / GREATEST(H.DISTCNT, 1) H_PFREQ,
 C.LENGTH COL_LEN,
 CU.TIMES TAMP CU_TIME,
 CU.EQUALITY_PREDS CU_EP,
 CU.EQUIJOIN_PREDS CU_EJP,
 CU.RANGE_PREDS CU_RP,
 CU.LIKE_PREDS CU_LP,
 CU.NONEQUIJOIN_PREDS CU_NEJP,
 CU.NULL_PREDS NP
  FROM SYS.USE        R$ U,
       SYS.OBJ$       O,
       SYS.TAB$       T,
       SYS.COL$       C,
       SYS.COL_USAGE$ CU,
       SYS.HIST_HEAD$ H
 WHERE :B3 = '0'
   AND U.NAME = :B2
   AND O.OWNER# = U.USER#
   AND O.TYPE# = 2
   AND O.NAME = :B1
   AND O.OBJ# = T.OBJ#
   AND O.OBJ# = C.OBJ#
   AND C.OBJ# = CU.OBJ#(+)
   AND C.INTCOL# = CU.INTCOL#(+)
   AND C.OBJ# = H.OBJ#(+)
   AND C.INTCOL# = H.INTCOL#(+)
UNION ALL
SELECT /*+
ordered use_nl(c) */
 C.KQFCONAM COL_NAME,
 C.KQFCODTY COL_TYPE,
 DECODE(C.KQFCODTY, 1, 1, 0) COL_CSF,
 NULL COL_DEF,
 0 COL_NULL,
 0 COL_PROP,
 C.KQFCOCNO COL_UNUM,
 C.KQFCOC NO COL_INUM,
 O.KQFTAOBJ COL_OBJ,
 DECODE(C.KQFCODTY, 2, -127, 0) COL_SCALE,
 H.BUCKET_CNT H_BCNT,
 (ST.ROWCNT - NULL_CNT) / GREATEST(H.DISTCNT, 1) H_PFREQ,
 DECODE(C.KQFCODTY, 2, 22, C.KQFCOSIZ) COL_LEN,
 CU.TIMESTAMP CU_TIME,
 CU.EQUALITY_PREDS CU_EP,
 CU.EQUIJOIN_PREDS CU_EJP,
 CU.RANGE_PREDS CU_RP,
 CU.LIKE_PREDS CU_LP,
 CU.NONEQUIJOIN_PREDS CU _NEJP,
 CU.NULL_PREDS NP
  FROM SYS.X$KQFTA    O,
       SYS.TAB_STATS$ ST,
       SYS.X$KQFCO    C,
       SYS.COL_USAGE$ CU,
       SYS.HIST_HEAD$ H
 WHERE :B3 != '0'
   AND :B2 = 'SYS'
   AND O.KQFTANAM = :B1
   AND O.KQFTAOBJ = ST.OBJ#(+)
   AND O.KQFTAOBJ = C.KQFCOTOB
   AND C.KQFCOTOB = CU.OBJ#(+)
   AND C.KQFCOCNO = CU.INTCOL#(+)
   AND C.KQFCOTOB = H.OBJ#(+)
   AND C.KQFCOCNO = H.INTCO L#(+)

現象

根據Metalink Note<Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort [ID 332177.1]>:

Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort [ID 332177.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.4.0
This problem can occur on any platform.
Symptoms
The database is not shutting down for a considerable time when you issue the command :
shutdown immediate
To shut it down in a reasonable time you have to issue the command
shutdown abort
To collect some diagnostics before issuing the shutdown immediate command set a trace event as follows:
Connect as SYS (/ as sysdba)
SQL> alter session set events '10046 trace name context forever,level 12';
SQL> shutdown immediate;
In the resultant trace file (within the udump directory) you see something similar to the following :-
PARSING IN CURSOR #n
delete from sys.col_usage$ c where not exists   (select 1 from sys.obj$ o where o.obj# = c.obj# )
...followed by loads of.....
WAIT #2: nam='db file sequential read' ela= 23424 p1=1 p2=4073 p3=1
....
WAIT #2: nam='db file scattered read' ela= 1558 p1=1 p2=44161 p3=8
etc
Then eventually
WAIT #2: nam='log file sync' ela= 32535 p1=4111 p2=0 p3=0
...some other SQL....then back to
WAIT #2: nam='db file sequential read' ela= 205 p1=1 p2=107925 p3=1
WAIT #2: nam='db file sequential read' ela= 1212 p1=1 p2=107926 p3=1
WAIT #2: nam='db file sequential read' ela= 212 p1=1 p2=107927 p3=1
WAIT #2: nam='db file scattered read' ela= 1861 p1=1 p2=102625 p3=8
etc....
To verify which objects are involved here you can use a couple of the P1 & P2 values from above
:-
a) a sequential read
SELECT owner,segment_name,segment_type
FROM dba_extents
WHERE file_id=1
AND 107927 BETWEEN block_id AND block_id + blocks
b) a scattered read
SELECT owner,segment_name,segment_type
FROM dba_extents
WHERE file_id=1
AND 102625 BETWEEN block_id AND block_id + blocks
The output confirms that the objects are
SYS.I_COL_USAGE$  (INDEX)   and   SYS.COL_USAGE$ (TABLE)
Finally, issue select count(*) from sys.col_usage$;
Cause
If the number of entries in sys.col_usage$ is large then you are very probably hitting the issue raised in
Bug: 3540022 9.2.0.4.0 RDBMS Base Bug 3221945
Abstract: CLEAN-UP OF ENTRIES IN COL_USAGE$
Base Bug 3221945 9.2.0.3 RDBMS
Abstract: ORA-1631 ON COL_USAGE$
Closed as "Not a Bug"
However, when a table is dropped, the column usage statistics are not dropped. They are left as they are.
When the database is shutdown (in normal mode), then these "orphaned" column usage entries are deleted. The code
which does this gets called only during normal shutdown.
Unless and until the database is shutdown, the col_usage$ table will continue to grow.
Solution
To implement the workaround, please execute the following steps:
1. Periodically (eg once a day) run exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO will clean out redundant col_usage$ entries, and when
you come to shutdown the database you should not have a huge number of entries left to clean up.

該文檔指出了在shutdown instanceSMON會着手清理col_usage$中已被drop表的相關predicate columnsorphaned”記錄,如果在本次實例的生命週期中曾生成大量最後被drop的中間表,那麼col_usage$中已經堆積了衆多的orphaned”記錄,SMON爲了完成cleanup工作需要花費大量時間導致shutdown變慢。這個文檔還指出定期執行DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO也可以清理col_usage$中的冗餘記錄。

我們來觀察一下SMON的清理工作:

begin
  for i in 1 .. 5000 loop
    execute immediate 'create table maclean1' || i ||' tablespace fragment as select 1 t1 from dual';
    execute immediate 'select * from maclean1' || i || ' where t1=1';
  end loop;
  DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
  for i in 1 .. 5000 loop
    execute immediate 'drop table maclean1' || i;
  end loop;
end;
/
SQL> purge dba_recyclebin;
DBA Recyclebin purged.
我們可以通過以下查詢瞭解col_usage$上的orphaned記錄總數,這也將是在instance shutdown
SMON所需要清理的數目
  select count(*)
    from sys.col_usage$ c
   where not exists (select /*+ unnest */
           1
            from sys.obj$ o
           where o.obj# = c.obj#);
  COUNT(*)
----------
     10224
針對SMON10046 level 12 trace
SQL> oradebug setospid 30225;
Oracle pid: 8, Unix process pid: 30225, image: [email protected] (SMON)
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> shutdown immediate;
=================10046 trace content==================
lock table sys.col_usage$ in exclusive mode nowait
delete from sys.col_usage$ where obj#= :1 and intcol#= :2
delete from sys.col_usage$ c
 where not exists (select /*+ unnest */
         1
          from sys.obj$ o
         where o.obj# = c.obj#)

如何禁止SMON維護col_usage$字典基表

1.設��隱藏參數_column_tracking_level(column usage tracking),該參數默認爲1即啓用column使用情況跟蹤。設置該參數爲0,將禁用column tracking,該參數可以在sessionsystem級別動態修改:

SQL> col name for a25
SQL> col DESCRIB for a25
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%_column_tracking_level%';
NAME                      VALUE      DESCRIB
------------------------- ---------- -------------------------
_column_tracking_level    1          column usage tracking
SQL> alter session set "_column_tracking_level"=0 ;
Session altered.
SQL> alter system set "_column_tracking_level"=0 scope=both;
System altered.

2.關閉DML monitoring,可以通過設置隱藏參數_dml_monitoring_enabled(enable modification monitoring)false實現,disable dml monitoringCBO的影響較大,所以我們一般推薦上一種方式:

SQL> SELECT monitoring, count(*) from DBA_TABLES group by monitoring;
MON   COUNT(*)
--- ----------
NO          79
YES       2206
SQL> alter system set "_dml_monitoring_enabled"=false;
System altered.
SQL> SELECT monitoring, count(*) from DBA_TABLES group by monitoring;
MON   COUNT(*)
--- ----------
NO        2285
實際上dba_tablesmonitoring列來源於內部參數_dml_monitoring_enabled
SQL> set long 99999
SQL> select text from dba_views where view_name='DBA_TABLES';
TEXT
--------------------------------------------------------------------------------
select u.name, o.name, decode(bitand(t.property,2151678048), 0, ts.name, null),
       decode(bitand(t.property, 1024), 0, null, co.name),
       decode((bitand(t.property, 512)+bitand(t.flags, 536870912)),
              0, null, co.name),
       decode(bitand(t.trigflag, 1073741824), 1073741824, 'UNUSABLE', 'VALID'),
       decode(bitand(t.property, 32+64), 0, mod(t.pctfree$, 100), 64, 0, null),
       decode(bitand(ts.flags, 32), 32, to_number(NULL),
          decode(bitand(t.property, 32+64), 0, t.pctused$, 64, 0, null)),
       decode(bitand(t.property, 32), 0, t.initrans, null),
       decode(bitand(t.property, 32), 0, t.maxtrans, null),
       s.iniexts * ts.blocksize,
       decode(bitand(ts.flags, 3), 1, to_number(NULL),
                                      s.extsize * ts.blocksize),
       s.minexts, s.maxexts,
       decode(bitand(ts.flags, 3), 1, to_number(NULL),
                                      s.extpct),
       decode(bitand(ts.flags, 32), 32, to_number(NULL),
         decode(bitand(o.flags, 2), 2, 1, decode(s.lists, 0, 1, s.lists))),
       decode(bitand(ts.flags, 32), 32, to_number(NULL),
         decode(bitand(o.flags, 2), 2, 1, decode(s.groups, 0, 1, s.groups))),
       decode(bitand(t.property, 32+64), 0,
                decode(bitand(t.flags, 32), 0, 'YES', 'NO'), null),
       decode(bitand(t.flags,1), 0, 'Y', 1, 'N', '?'),
       t.rowcnt,
       decode(bitand(t.property, 64), 0, t.blkcnt, null),
       decode(bitand(t.property, 64), 0, t.empcnt, null),
       t.avgspc, t.chncnt, t.avgrln, t.avgspc_flb,
       decode(bitand(t.property, 64), 0, t.flbcnt, null),
       lpad(decode(t.degree, 32767, 'DEFAULT', nvl(t.degree,1)),10),
       lpad(decode(t.instances, 32767, 'DEFAULT', nvl(t.instances,1)),10),
       lpad(decode(bitand(t.flags, 8), 8, 'Y', 'N'),5),
       decode(bitand(t.flags, 6), 0, 'ENABLED', 'DISABLED'),
       t.samplesize, t.analyzetime,
       decode(bitand(t.property, 32), 32, 'YES', 'NO'),
       decode(bitand(t.property, 64), 64, 'IOT',
               decode(bitand(t.property, 512), 512, 'IOT_OVERFLOW',
               decode(bitand(t.flags, 536870912), 536870912, 'IOT_MAPPING', null
))),
       decode(bitand(o.flags, 2), 0, 'N', 2, 'Y', 'N'),
       decode(bitand(o.flags, 16), 0, 'N', 16, 'Y', 'N'),
       decode(bitand(t.property, 8192), 8192, 'YES',
              decode(bitand(t.property, 1), 0, 'NO', 'YES')),
       decode(bitand(o.flags, 2), 2, 'DEFAULT',
             decode(s.cachehint, 0, 'DEFAULT', 1, 'KEEP', 2, 'RECYCLE', NULL)),
       decode(bitand(t.flags, 131072), 131072, 'ENABLED', 'DISABLED'),
       decode(bitand(t.flags, 512), 0, 'NO', 'YES'),
       decode(bitand(t.flags, 256), 0, 'NO', 'YES'),
       decode(bitand(o.flags, 2), 0, NULL,
          decode(bitand(t.property, 8388608), 8388608,
                 'SYS$SESSION', 'SYS$TRANSACTION')),
       decode(bitand(t.flags, 1024), 1024, 'ENABLED', 'DISABLED'),
       decode(bitand(o.flags, 2), 2, 'NO',
           decode(bitand(t.property, 2147483648), 2147483648, 'NO',
              decode(ksppcv.ksppstvl, 'TRUE', 'YES', 'NO'))),
       decode(bitand(t.property, 1024), 0, null, cu.name),
       decode(bitand(t.flags, 8388608), 8388608, 'ENABLED', 'DISABLED'),
       decode(bitand(t.property, 32), 32, null,
                decode(bitand(s.spare1, 2048), 2048, 'ENABLED', 'DISABLED')),
       decode(bitand(o.flags, 128), 128, 'YES', 'NO')
from sys.user$ u, sys.ts$ ts, sys.seg$ s, sys.obj$ co, sys.tab$ t, sys.obj$ o,
     sys.obj$ cx, sys.user$ cu, x$ksppcv ksppcv, x$ksppi ksppi
where o.owner# = u.user#
  and o.obj# = t.obj#
  and bitand(t.property, 1) = 0
  and bitand(o.flags, 128) = 0
  and t.bobj# = co.obj# (+)
  and t.ts# = ts.ts#
  and t.file# = s.file# (+)
  and t.block# = s.block# (+)
  and t.ts# = s.ts# (+)
  and t.dataobj# = cx.obj# (+)
  and cx.owner# = cu.user# (+)
  and ksppi.indx = ksppcv.indx
  and ksppi.ksppinm = '_dml_monitoring_enabled'

瞭解你所不知道的SMON功能():Recover Dead transaction

SMON的作用還包括清理死事務:Recover Dead transaction。當服務進程在提交事務(commit)前就意外終止的話會形成死事務(dead transaction)PMON進程負責輪詢Oracle進程,找出這類意外終止的死進程(dead process),通知SMON將與該dead process相關的dead transaction回滾清理,並且PMON還負責恢復dead process原本持有的鎖和latch

我們來具體瞭解dead transaction的恢復過程:

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> select  * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL>alter system set fast_start_parallel_rollback=false;
System altered.
設置1050010046事件以跟蹤SMON進程的行爲
SQL> alter system set events '10500 trace name context forever,level 8';
System altered.
SQL> oradebug setospid 4424
Oracle pid: 8, Unix process pid: 4424, image: [email protected] (SMON)
SQL> oradebug event 10046 trace name context forever,level 8;
Statement processed.
在一個新的terminal中執行大批量的刪除語句,在執行一段時間後使用操作系統命令將執行該刪除操作的
服務進程kill掉,模擬一個大的dead transaction的場景
SQL> delete large_rb;
delete large_rb
[oracle@rh2 bdump]$ kill -9 4535
等待幾秒後pmon進程會找出dead process:
[claim lock for dead process][lp 0x7000003c70ceff0][p 0x7000003ca63dad8.1290666][hist x9a514951]
x$ktube內部視圖中出現ktuxecfl(Transaction flags)標記爲DEAD的記錄:
SQL> select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';
SUM(DISTINCT(KTUXESIZ))
-----------------------
                  29386
SQL> /
SUM(DISTINCT(KTUXESIZ))
-----------------------
                  28816
以上KTUXESIZ代表事務所使用的undo塊總數(number of undo blocks used by the transaction)
==================smon trace content==================
SMON: system monitor process posted
WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332801935
WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332815568
WAIT #0: nam='latch: row cache objects' ela= 95 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333332734
WAIT #0: nam='latch: row cache objects' ela= 83 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333356173
WAIT #0: nam='latch: undo global data' ela= 104 address=3066991984 number=187 tries=1 obj#=1 tim=1278243347987705
WAIT #0: nam='latch: object queue header operation' ela= 89 address=3094817048 number=131 tries=0 obj#=1 tim=1278243362468042
WAIT #0: nam='log file switch (checkpoint incomplete)' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243419588202
Dead transaction 0x00c2.008.0000006d recovered by SMON
=====================
PARSING IN CURSOR #3 len=358 dep=1 uid=0 oct=3 lid=0 tim=1278243423594568 hv=3186851936 ad='ae82c1b8'
select smontabv.cnt,
       smontab.time_mp,
       smontab.scn,
       smontab.num_mappings,
       smontab.tim_scn_map,
       smontab.orig_thread
  from smon_scn_time smontab,
       (select max(scn) scnmax,
               count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt
          from smon_scn_time
         where thread = 0) smontabv
 where smontab.scn = smontabv.scnmax
   and thread = 0
END OF STMT
PARSE #3:c=0,e=1354526,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1278243423594556
EXEC #3:c=0,e=106,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1278243423603269
FETCH #3:c=0,e=47065,p=0,cr=319,cu=0,mis=0,r=1,dep=1,og=4,tim=1278243423650375
*** 2011-06-24 21:19:25.899
WAIT #0: nam='smon timer' ela= 299999999 sleep time=300 failed=0 p3=0 obj#=1 tim=1278243716699171
kglScanDependencyHandles4Unpin():
  cumscan=3 cumupin=4 time=776 upinned=0

以上SMON回���清理Dead transaction的過程從”system monitor process posted”開始到”Dead transaction 0x00c2.008.0000006d recovered by SMON”結束。另外可以看到在恢復過程中SMON先後請求了’latch: row cache objects’、’latch: undo global data’、’latch: object queue header operation’三種不同類型的latch

現象

fast_start_parallel_rollback參數決定了SMON在回滾事務時使用的並行度,若將該參數設置爲false那麼並行回滾將被禁用,若設置爲Low(默認值)那麼會以2*CPU_COUNT數目的並行度回滾,當設置爲High4*CPU_COUNT數目的回滾進程將參與進來。當我們通過以下查詢發現系統中存在大的dead tranacation需要回滾時我們可以通過設置fast_start_parallel_rollbackHIGH來加速恢復:

select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD';
==============parallel transaction recovery===============
*** 2011-06-24 20:31:01.765
SMON: system monitor process posted msgflag:0x0000 (-/-/-/-/-/-/-)
*** 2011-06-24 20:31:01.765
SMON: process sort segment requests begin
*** 2011-06-24 20:31:01.765
SMON: process sort segment requests end
*** 2011-06-24 20:31:01.765
SMON: parallel transaction recovery begin
WAIT #0: nam='DFS lock handle' ela= 504 type|mode=1413545989 id1=3 id2=11 obj#=2 tim=1308918661765715
WAIT #0: nam='DFS lock handle' ela= 346 type|mode=1413545989 id1=3 id2=12 obj#=2 tim=1308918661766135
WAIT #0: nam='DFS lock handle' ela= 565 type|mode=1413545989 id1=3 id2=13 obj#=2 tim=1308918661766758
WAIT #0: nam='DFS lock handle' ela= 409 type|mode=1413545989 id1=3 id2=14 obj#=2 tim=1308918661767221
WAIT #0: nam='DFS lock handle' ela= 332 type|mode=1413545989 id1=3 id2=15 obj#=2 tim=1308918661767746
WAIT #0: nam='DFS lock handle' ela= 316 type|mode=1413545989 id1=3 id2=16 obj#=2 tim=1308918661768146
WAIT #0: nam='DFS lock handle' ela= 349 type|mode=1413545989 id1=3 id2=17 obj#=2 tim=1308918661768549
WAIT #0: nam='DFS lock handle' ela= 258 type|mode=1413545989 id1=3 id2=18 obj#=2 tim=1308918661768858
WAIT #0: nam='DFS lock handle' ela= 310 type|mode=1413545989 id1=3 id2=19 obj#=2 tim=1308918661769224
WAIT #0: nam='DFS lock handle' ela= 281 type|mode=1413545989 id1=3 id2=20 obj#=2 tim=1308918661769555
*** 2011-06-24 20:31:01.769
SMON: parallel transaction recovery end

但是在real world的實踐中可以發現當fast_start_parallel_rollback= Low/High,即啓用並行回滾時常有並行進程因爲各種資源互相阻塞導致回滾工作停滯的例子,當遭遇到這種問題時將fast_start_parallel_rollback設置爲FALSE一般可以保證恢復工作以串行形式在較長時間內完成。

如何禁止SMON Recover Dead transaction

可以設置10513事件來臨時禁止SMON恢復死事務,這在我們做某些異常恢復的時候顯得異常有效,當然不建議在一個正常的生產環境中設置這個事件:

SQL> alter system set events '10513 trace name context forever, level 2';
System altered.
10531 -- event disables transaction recovery which was initiated by SMON
SQL> select ktuxeusn,
  2         to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS') "Time",
  3         ktuxesiz,
  4         ktuxesta
  5    from x$ktuxe
  6   where ktuxecfl = 'DEAD';
  KTUXEUSN Time                         KTUXESIZ KTUXESTA
---------- -------------------------- ---------- ----------------
        17 24-JUN-2011 22:03:10                0 INACTIVE
        66 24-JUN-2011 22:03:10                0 INACTIVE
       105 24-JUN-2011 22:03:10                0 INACTIVE
       193 24-JUN-2011 22:03:10            33361 ACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       197 24-JUN-2011 22:03:10            20171 ACTIVE
7 rows selected.
SQL> /
  KTUXEUSN Time                         KTUXESIZ KTUXESTA
---------- -------------------------- ---------- ----------------
        17 24-JUN-2011 22:03:10                0 INACTIVE
        66 24-JUN-2011 22:03:10                0 INACTIVE
       105 24-JUN-2011 22:03:10                0 INACTIVE
       193 24-JUN-2011 22:03:10            33361 ACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       194 24-JUN-2011 22:03:10                0 INACTIVE
       197 24-JUN-2011 22:03:10            20171 ACTIVE
7 rows selected.
================smon disabled trans recover trace==================
SMON: system monitor process posted
*** 2011-06-24 22:02:57.980
SMON: Event 10513 is level 2, trans recovery disabled.

瞭解你所不知道的SMON功能():清理IND$字典基表

SMON的作用還包括清理IND$字典基表(cleanup ind$):

觸發場景

當我們在線創建或重建索引時(create or rebuild index online),服務進程會到IND$字典基表中將該索引對應的記錄的FLAGS字段修改爲十進制的256或者512(見上圖0×100=256,0×200=512),如:

SQL> create index macleans_index on larges(owner,object_name) online;
SQL> select obj# from obj$ where name='MACLEANS_INDEX';
      OBJ#
----------
   1343842
SQL> select FLAGS from ind$ where obj#=1343842;
     FLAGS
----------
       256
ind_online$字典基表記錄了索引在線創建/重建的歷史
SQL> select * from ind_online$;
      OBJ#      TYPE#      FLAGS
---------- ---------- ----------
   1343839          1        256
   1343842          1        256
create table ind_online$
( obj#          number not null,
  type#         number not null,              /* what kind of index is this? */
                                                               /* normal : 1 */
                                                               /* bitmap : 2 */
                                                              /* cluster : 3 */
                                                            /* iot - top : 4 */
                                                         /* iot - nested : 5 */
                                                            /* secondary : 6 */
                                                                 /* ansi : 7 */
                                                                  /* lob : 8 */
                                             /* cooperative index method : 9 */
  flags         number not null
                                      /* index is being online built : 0x100 */
                                    /* index is being online rebuilt : 0x200 */
)

原則上online create/rebuild index的的清理工作由實際操作的服務進程負責完成,這種清理在DDL語句成功的情況下包括一系列數據字典的維護,在該DDL語句失敗的情形中包括對臨時段的清理和數據字典的維護,無論如何都需要drop在線日誌中間表 SYS_JOURNAL_nnnnn(nnnn爲該索引的obj#)。數據字典的維護工作就包含對IND$基表中相應索引記錄的FLAGS標誌位的恢復,但是如果服務進程在語句執行過程中意外終止的話,那麼短時間內FLAGS標誌位字段就無法得到恢復,這將導致對該索引的後續操作因ORA-8104錯誤而無法繼續:

SQL> drop index macleans_index;
drop index macleans_index
           *
ERROR at line 1:
ORA-08104: this index object 1343842 is being online built or rebuilt
08104, 00000, "this index object %s is being online built or rebuilt"
// *Cause:  the index is being created or rebuild or waited for recovering
//          from the online (re)build
// *Action: wait the online index build or recovery to complete

SMON負責在啓動後(startup)的每小時執行一次對IND$基表中因在線創建/重建索引失敗所留下記錄的清理,這種清理工作由kdicclean函數驅動(kdicclean is run by smon every 1 hourcalled from SMON to find if there is any online builder death and cleanup our ind$ and obj$ and drop the journal table, stop journaling)
這種清理工作典型的調用堆棧stack call如下:

ksbrdp -> ktmSmonMain  ktmmon -> kdicclean -> kdic_cleanup -> ktssdrp_segment

注意因爲SMON進程的清理工作每小時才執行一次,而且在工作負載很高的情況下可能實際很久都不會得到清理,在這種情景中我們總是希望能儘快完成對索引的在線創建或重建,在10gr2以後的版本中我們可以直接使用dbms_repair.online_index_clean來手動清理online index rebuild的遺留問題:

SQL> drop index macleans_index;
drop index macleans_index
           *
ERROR at line 1:
ORA-08104: this index object 1343842 is being online built or rebuilt
DECLARE
 isClean BOOLEAN;
BEGIN
  isClean := FALSE;
  WHILE isClean=FALSE
  LOOP
    isClean := dbms_repair.online_index_clean(
    dbms_repair.all_index_id, dbms_repair.lock_wait);
    dbms_lock.sleep(10);
  END LOOP;
END;
/
SQL>  drop index macleans_index;
 drop index macleans_index
            *
ERROR at line 1:
ORA-01418: specified index does not exist
成功清理

但是如果在9i中的話就比較麻煩,可以嘗試用以下方法(不是很推薦,除非你已經等了很久):

1.首先手工刪除在線日誌表,通過以下手段找出這個中間表的名字
select object_name
  from dba_objects
 where object_name like
       (select '%' || object_id || '%'
          from dba_objects
         where object_name = '&INDEX_NAME')
/
Enter value for index_name: MACLEANS_INDEX
old   6:          where object_name = '&INDEX_NAME')
new   6:          where object_name = 'MACLEANS_INDEX')
OBJECT_NAME
--------------------------------------------------------------------------------
SYS_JOURNAL_1343845
SQL> drop table SYS_JOURNAL_1343845;
Table dropped.
2.第二步要手動修改IND$字典基表
!!!!!! 注意!手動修改數據字典要足夠小心!!
select flags from ind$ where obj#=&INDEX_OBJECT_ID;
Enter value for index_object_id: 1343845
old   1: select flags from ind$ where obj#=&INDEX_OBJECT_ID
new   1: select flags from ind$ where obj#=1343845
     FLAGS
----------
       256
a) 針對online create index,手動刪除對應的記錄
delete from IND$ where obj#=&INDEX_OBJECT_ID
b) 針對online rebuild index,手動恢復對應記錄的FLAGS標誌位
update IND$ set FLAGS=FLAGS-512 where obj#=&INDEX_OBJECT_ID

接下來我們實際觀察一下清理工作的細節:

SQL> select obj# from obj$ where name='MACLEANS_INDEX';
      OBJ#
----------
   1343854
SQL> select FLAGS from ind$ where obj#=1343854;
     FLAGS
----------
       256
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 8;
Statement processed.
SQL> DECLARE
  2   isClean BOOLEAN;
  3  BEGIN
  4    isClean := FALSE;
  5    WHILE isClean=FALSE
  6    LOOP
  7      isClean := dbms_repair.online_index_clean(
  8      dbms_repair.all_index_id, dbms_repair.lock_wait);
  9
 10      dbms_lock.sleep(10);
 11    END LOOP;
 12  END;
 13  /
PL/SQL procedure successfully completed.
===============================10046 trace=============================
select i.obj#, i.flags, u.name, o.name, o.type#
  from sys.obj$ o, sys.user$ u, sys.ind_online$ i
 where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)
   and (not ((i.type# = 9) and bitand(i.flags, 8) = 8))
   and o.obj# = i.obj#
   and o.owner# = u.user#
select u.name,
       o.name,
       o.namespace,
       o.type#,
       decode(bitand(i.property, 1024), 0, 0, 1)
  from ind$ i, obj$ o, user$ u
 where i.obj# = :1
   and o.obj# = i.bo#
   and o.owner# = u.user#
delete from object_usage
 where obj# in (select a.obj#
                  from object_usage a, ind$ b
                 where a.obj# = b.obj#
                   and b.bo# = :1)
drop table "SYS"."SYS_JOURNAL_1343854" purge
delete from icoldep$ where obj# in (select obj# from ind$ where bo#=:1)
delete from ind$ where bo#=:1
delete from ind$ where obj#=:1

我們可以利用以下語句找出系統中可能需要恢復的IND$記錄,注意不要看到查詢有結果就認爲這是操作失敗的徵兆,很可能是有人在線創建或重建索引:

select i.obj#, i.flags, u.name, o.name, o.type#
  from sys.obj$ o, sys.user$ u, sys.ind_online$ i
 where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)
   and (not ((i.type# = 9) and bitand(i.flags, 8) = 8))
   and o.obj# = i.obj#
   and o.owner# = u.user#
/

相關診斷事件

可以通過設置診斷事件event=’8105 trace name context forever’
來禁止SMON清理IND$(Oracle event to turn off smon cleanup for online index build)

alter system set events '8105 trace name context forever';

瞭解你所不知道的SMON功能():維護MON_MODS$字典基表

SMON後臺進程的作用還包括維護MON_MODS$基表,當初始化參數STATISTICS_LEVEL被設置爲TYPICALALL時默認會啓用Oracle中表監控的特性,Oracle會默認監控表上的自上一次分析以後(Last analyzed)發生的INSERT,UPDATE,DELETE以及表是否被TRUNCATE截斷,並將這些操作數量的近似值記錄到數據字典基表MON_MODS$中,我們常用的一個DML視圖dba_tab_modifications的數據實際來源於另一個數據字典基表MON_MODS_ALL$SMON定期會將MON_MODS$中符合要求的數據MERGEMON_MODS_ALL$中。

Rem DML monitoring
create table mon_mods$
(
  obj#              number,                                 /* object number */
  inserts           number,  /* approx. number of inserts since last analyze */
  updates           number,  /* approx. number of updates since last analyze */
  deletes           number,  /* approx. number of deletes since last analyze */
  timestamp         date,     /* timestamp of last time this row was changed */
  flags             number,                                         /* flags */
                                           /* 0x01 object has been truncated */
  drop_segments     number   /* number of segemnt in part/subpartition table */
)
  storage (initial 200K next 100k maxextents unlimited pctincrease 0)
/
create unique index i_mon_mods$_obj on mon_mods$(obj#)
  storage (maxextents unlimited)
/
Rem DML monitoring, has info aggregated to global level for paritioned objects
create table mon_mods_all$
(
  obj#              number,                                 /* object number */
  inserts           number,  /* approx. number of inserts since last analyze */
  updates           number,  /* approx. number of updates since last analyze */
  deletes           number,  /* approx. number of deletes since last analyze */
  timestamp         date,     /* timestamp of last time this row was changed */
  flags             number,                                         /* flags */
                                           /* 0x01 object has been truncated */
  drop_segments     number   /* number of segemnt in part/subpartition table */
)
  storage (initial 200K next 100k maxextents unlimited pctincrease 0)
/
create unique index i_mon_mods_all$_obj on mon_mods_all$(obj#)
  storage (maxextents unlimited)
/
Rem =========================================================================
Rem End Usage monitoring tables
Rem =========================================================================
VIEW DBA_TAB_MODIFICATIONS
select u.name, o.name, null, null,
       m.inserts, m.updates, m.deletes, m.timestamp,
       decode(bitand(m.flags,1),1,'YES','NO'),
       m.drop_segments
from sys.mon_mods_all$ m, sys.obj$ o, sys.tab$ t, sys.user$ u
where o.obj# = m.obj# and o.obj# = t.obj# and o.owner# = u.user#
union all
select u.name, o.name, o.subname, null,
       m.inserts, m.updates, m.deletes, m.timestamp,
       decode(bitand(m.flags,1),1,'YES','NO'),
       m.drop_segments
from sys.mon_mods_all$ m, sys.obj$ o, sys.user$ u
where o.owner# = u.user# and o.obj# = m.obj# and o.type#=19
union all
select u.name, o.name, o2.subname, o.subname,
       m.inserts, m.updates, m.deletes, m.timestamp,
       decode(bitand(m.flags,1),1,'YES','NO'),
       m.drop_segments
from sys.mon_mods_all$ m, sys.obj$ o, sys.tabsubpart$ tsp, sys.obj$ o2,
     sys.user$ u
where o.obj# = m.obj# and o.owner# = u.user# and
      o.obj# = tsp.obj# and o2.obj# = tsp.pobj#

現象:

SMON後臺進程會每15分鐘將SGA中的DML統計信息刷新到SYS.MON_MODS$基表中(SMON flush every 15 minutes to SYS.MON_MODS$)
同時會將SYS.MON_MODS$中符合要求的數據MERGE合併到MON_MODS_ALL$中,並清空原MON_MODS$中的數據。
MON_MODS_ALL$
作爲dba_tab_modifications視圖的數據來源,起到輔助統計信息收集的作用,詳見拙作<Does GATHER_STATS_JOB gather all objects’ stats every time?>

SMON具體將DML統計數據刷新到SYS.MON_MODS$、合併到MON_MODS_ALL$、並清除數據的操作如下:

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
/* 填充mon_mods$字典基表 */
lock table sys.mon_mods$ in exclusive mode nowait
insert into sys.mon_mods$
  (obj#, inserts, updates, deletes, timestamp, flags, drop_segments)
values
  (:1, :2, :3, :4, :5, :6, :7)
update sys.mon_mods$
   set inserts       = inserts + :ins,
       updates       = updates + :upd,
       deletes       = deletes + :del,
       flags        =
       (decode(bitand(flags, :flag), :flag, flags, flags + :flag)),
       drop_segments = drop_segments + :dropseg,
       timestamp     = :time
 where obj# = :objn
lock table sys.mon_mods_all$ in exclusive mode
/* 以下merge命令會將mon_mods$中的記錄合併到mon_mods_all$,
   若有匹配的記錄,則在原記錄的基礎上增加insertsupdatesdeletes總數,
   否則插入新的記錄 
*/
merge /*+ dynamic_sampling(mm 4) dynamic_sampling_est_cdn(mm)                           
dynamic_sampling(m 4) dynamic_sampling_est_cdn(m) */
into sys.mon_mods_all$ mm
using (select m.obj#          obj#,
              m.inserts       inserts,
              m.updates       updates,
              m.deletes       deletes,
              m.flags         flags,
              m.timestamp     timestamp,
              m.drop_segments drop_segments fr om sys.mon_mods$ m,
              tab$            t where m.obj# = t.obj#) v
on (mm.ob j# = v.obj#)
when matched then
  update
     set mm.inserts       = mm.inserts + v.inserts,
         mm.updates       = mm.updates + v.updates,
         mm.deletes       = mm.deletes + v.deletes,
         mm.flags         = mm.flags + v.flags - bitand(mm.flags, v.flags) /* bitor(mm.flags,v.flags) */,
         mm.timestamp     = v.timestamp,
         mm.drop_segments = mm.drop_segments + v.drop_segments
when NOT matched then
  insert
    (obj#, inserts, updates, deletes, timestamp, flags, drop_segments)
  values
    (v.obj#,
     v.inserts,
     v.updates,
     v.deletes,
     sysdate,
     v.flags,
     v.drop_segments) / all merge /*+ dynamic_sampling(mm 4) dynamic_sampling_est_cdn(mm)                           
dynamic_sampling(m 4) dynamic_sampling_est_cdn(m) */
  into sys.mon_mods_all$ mm using
    (select m.obj#          obj#,
            m.inserts       inserts,
            m.updates       updates,
            m.deletes       deletes,
            m.flags         flags,
            m.timestamp     timestamp,
            m.drop_segments drop_segments fr om sys.mon_mods$ m,
            tab$            t where m.obj# = t.obj#) v on
    (mm.ob j# = v.obj#)
when matched then
  update
     set mm.inserts       = mm.inserts + v.inserts,
         mm.updates       = mm.updates + v.updates,
         mm.deletes       = mm.deletes + v.deletes,
         mm.flags         = mm.flags + v.flags - bitand(mm.flags, v.flags) 
         /* bitor(mm.flags,v.flags) */,
         mm.timestamp     = v.timestamp,
         mm.drop_segments = mm.drop_segments + v.drop_segments
when NOT matched then
  insert
    (obj#, inserts, updates, deletes, timestamp, flags, drop_segments)
  values
    (v.obj#,
     v.inserts,
     v.updates,
     v.deletes,
     sysdate,
     v.flags,
     v.drop_segments)
/* 最後刪除sys.mon_mods$上的相關記錄 */
delete /*+ dynamic_sampling(m 4) dynamic_sampling_est_cdn(m) */
from sys.mon_mods$ m
 where exists (select /*+ unnest */
         *
          from sys.tab$ t
         where t.obj# = m. obj#)
  select obj#
    from sys.mon_mods$
   where obj# not in (select obj# from sys.obj$)
Used to have a FULL TABLE SCAN on obj$ associated with monitoring information 
extracted in conjunction with mon_mods$ executed by SMON periodically.

因爲當SMON或用戶採用”DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO”存儲過程將DML數據刷新到mon_mods$mon_mods_all$中時會要求持有表上的排它鎖,所以在RAC環境中可能出現死鎖問題。

另外在早期版本中SMON可能因維護監控表而造成shutdown immediate緩慢或系統性能下降的問題,詳見:

<Shutdown immediate hangs if table monitoring enabled on [ID 263217.1]>
<Bug 2806297 – SMON can cause bad system performance if TABLE MONITORING enabled on lots of tables [ID 2806297.8]>

SMON維護MON_MODS$時相關的Stack CALL

kglpnal <- kglpin <- kxsGetRuntimeLock
<- kksfbc <- kkspsc0 <- kksParseCursor <- opiosq0 <- opiall0
<- opikpr <- opiodr <- PGOSF175_rpidrus <- skgmstack <- rpiswu2
<- kprball <- kprbbnd0 <- kprbbnd <- ksxmfmel <- ksxmfm
<- ksxmfchk <- ksxmftim <- ktmmon <- ktmSmonMain <- ksbrdp
<- opirip <- opidrv <- sou2o <- opimai_real <- ssthrdmain
<- main <- libc_start_main <- start

如何禁止SMON維護MON_MODS$

注意在缺省參數環境中創建的表總是啓用table monitoring:

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> create table maclean1 (t1 int);          
Table created.
/* 10g以後nomonitoringmonitoring選項不再有效  */
SQL> create table maclean2 (t1 int) nomonitoring;
Table created.
SQL>  select table_name,monitoring from dba_tables  where table_name like 'MACLEAN%';
TABLE_NAME                     MON
------------------------------ ---
MACLEAN1                       YES
MACLEAN2                       YES

通常來說我們不需要禁止SMON維護MON_MODS$,除非是在SMON維護過程中遭遇shutdown過慢、性能降低或者異常情況恢復SMON隨機terminate實例的問題。

10g以前可以使用MONITORINGNOMONITORING2個選項來控制表級別的監控是否被開啓,此外我們還可以通過dbms_stats.ALTER_SCHEMA_TAB_MONITORING(‘maclean’,false)存儲過程在schema級別的monitoring是否被開啓,但是在10g以後這些方法不再有效,MONITORINGNOMONITORING選項被廢棄(In 10g the MONITORING and NOMONITORING keywords are deprecated and will be ignored.),其原有功能被STATISTICS_LEVEL參數所覆蓋。

Table-monitoring特性現在完全由STATISTICS_LEVEL參數所控制:

l STATISTICS_LEVEL設置爲BASIC時,Table-monitoring將被禁用

l STATISTICS_LEVEL設置爲TYPICALALL,Table-monitoring將啓用

換而言之我們可以通過設置STATISTICS_LEVELBASIC達到禁止SMON後臺進程該種功能的作用,具體修改該參數的命令如下:

show parameter statistics_level
alter system set statistics_level = basic;

但是請注意如果你正在使用AMMASMM自動內存管理特性的話,那麼STATISTICS_LEVEL參數是不能設置爲BASIC的,因爲Auto-MemoryAuto-Sga特性都依賴於STATISTICS_LEVEL所控制的性能統計信息。若一定要這樣做那麼首先要diable AMM&ASMM:

 #diable 11g AMM ,have to bounce instance
 #alter system set memory_target =0 scope=spfile;
 #diable 10g ASMM
 alter system set sga_target=0;
 alter system set statistics_level = basic;

瞭解你所不知道的SMON功能():維護SMON_SCN_TIME字典基表

SMON後臺進程的作用還包括維護SMON_SCN_TIME基表。

SMON_SCN_TIME基表用於記錄過去時間段中SCN(system change number)與具體的時間戳(timestamp)之間的映射關係,因爲是採樣記錄這種映射關係,所以SMON_SCN_TIME可以較爲較爲粗糙地(不精確地)定位某個SCN的時間信息。實際的SMON_SCN_TIME是一張cluster table簇表。

http://www.oracledatabase12g.com/wp-content/uploads/2011/11/smon_scn_time.png

SMON_SCN_TIME時間映射表最大的用途是爲閃回類型的查詢(flashback type queries)提供一種將時間映射爲SCN的途徑(The SMON time mapping is mainly for flashback type queries to map a time to an SCN)

Metalink文檔<Error ORA-01466 while executing a flashback query. [ID 281510.1]>介紹了SMON更新SMON_SCN_TIME的規律:

在版本10gSMON_SCN_TIME6秒鐘被更新一次(In Oracle Database 10g, smon_scn_time is updated every 6 seconds hence that is the minimum time that the flashback query time needs to be behind the timestamp of the first change to the table.)

在版本9.2SMON_SCN_TIME5分鐘被更新一次(In Oracle Database 9.2, smon_scn_time is updated every 5 minutes hence the required delay between the flashback time and table properties change is at least 5 minutes.)

另外從10g開始SMON也會清理SMON_SCN_TIME中的記錄了,SMON後臺進程會每5分鐘被喚醒一次,檢查SMON_SCN_TIME在磁盤上的映射記錄總數,若總數超過144000條,則會使用以下語句刪除最老的一條記錄(time_mp最小):

delete from smon_scn_time
 where thread = 0
   and time_mp = (select min(time_mp) from smon_scn_time where thread = 0)

若僅僅刪除一條記錄不足以獲得足夠的空間,那麼SMON會反覆多次執行以上DELETE語句。

觸發場景

雖然Metalink文檔<Error ORA-01466 while executing a flashback query. [ID 281510.1]>指出了在10gSMON會以每6秒一次的頻率更新SMON_SCN_TIME基表,但是實際觀測可以發現更新頻率與SCN的增長速率相關,在較爲繁忙的實例中SCN的上升極快時SMON��能會以6秒一次的最短間隔頻率更新 , 但是在空閒的實例中SCN增長較慢,則仍會以每510分鐘一次頻率更新,例如:

[oracle@vrh8 ~]$ ps -ef|grep smon|grep -v grep
oracle    3484     1  0 Nov12 ?        00:00:02 ora_smon_G10R21
SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com & www.askmaclean.com
SQL> oradebug setospid 3484;
Oracle pid: 8, Unix process pid: 3484, image: [email protected] (SMON)
SQL> oradebug event 10500 trace name context forever,level 10 : 10046 trace name context forever,level 12;
Statement processed.
SQL>
SQL> oradebug tracefile_name;
/s01/admin/G10R21/bdump/g10r21_smon_3484.trc
/* 等待一定時間 */

找出SMON trace文件中insert數據到SMON_SCN_TIME的記錄:

 grep -A20 "insert into smon_scn_time" /s01/admin/G10R21/bdump/g10r21_smon_3484.trc
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas, num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #4:c=0,e=43,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290280848899596
BINDS #4:
kkscoacd
Bind#0
 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
 oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fb29844edb8 bln=22 avl=06 flg=05
 value=767145793
Bind#1
 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
 kxsbbbfp=7fff023ae780 bln=07 avl=07 flg=09
 value="11/14/2011 0:3:13"
Bind#2
 oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fff023ae70c bln=22 avl=04 flg=09
 value=954389
Bind#3
--
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas, num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #1:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290281434933390
BINDS #1:
kkscoacd
Bind#0
 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
 oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fb29844edb8 bln=22 avl=06 flg=05
 value=767146393
Bind#1
 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
 kxsbbbfp=7fff023ae780 bln=07 avl=07 flg=09
 value="11/14/2011 0:13:13"
Bind#2
 oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fff023ae70c bln=22 avl=04 flg=09
 value=954720
Bind#3
--
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas, num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #3:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290281727955249
BINDS #3:
kkscoacd
Bind#0
 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
 oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fb29844e960 bln=22 avl=06 flg=05
 value=767146993
Bind#1
 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
 kxsbbbfp=7fff023ae780 bln=07 avl=07 flg=09
 value="11/14/2011 0:23:13"
Bind#2
 oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fff023ae70c bln=22 avl=04 flg=09
 value=954926
Bind#3
insert into smon_scn_time (thread, time_mp, time_dp, scn, scn_wrp, scn_bas, num_mappings, tim_scn_map)
values (0, :1, :2, :3, :4, :5, :6, :7)
END OF STMT
PARSE #4:c=0,e=30,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1290282313990553
BINDS #4:
kkscoacd
Bind#0
 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
 oacflg=00 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fb29844edb8 bln=22 avl=06 flg=05
 value=767147294
Bind#1
 oacdty=12 mxl=07(07) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=8 off=0
 kxsbbbfp=7fff023ae780 bln=07 avl=07 flg=09
 value="11/14/2011 0:28:14"
Bind#2
 oacdty=02 mxl=22(04) mxlc=00 mal=00 scl=00 pre=00
 oacflg=10 fl2=0001 frm=00 csi=00 siz=24 off=0
 kxsbbbfp=7fff023ae70c bln=22 avl=04 flg=09
 value=955036
Bind#3

可以通過以上INSERT語句的TIME_DP綁定變量值中發現其更新SMON_SCN_TIME的時間規律,一般爲510分鐘一次。這說明SMON_SCN_TIME的更細頻率與數據庫實例的負載有關,其最短的間隔是每6秒一次,最長的間隔爲10分鐘一次。

由於SMON_SCN_TIME的更新頻率問題可能引起ORA-01466錯誤,詳見:
Error ORA-01466 while executing a flashback query. [ID 281510.1]

由於SMON_SCN_TIME的數據不一致可能引起ORA-00600[6711]或頻繁地執行”delete from smon_scn_time”刪除語句,詳見:
ORA-00600[6711]
錯誤一例
High Executions Of Statement “delete from smon_scn_time…” [ID 375401.1]

SMON維護SMON_SCN_TIME時相關的Stack CALLktf_scn_time是更新SMON_SCN_TIME的主要函數:

ksedst ksedmp ssexhd kghlkremf kghalo kghgex kghalf kksLoadChild kxsGetRuntimeLock kksfbc
kkspsc0 kksParseCursor opiosq0 opiall0 opikpr opiodr rpidrus skgmstack rpidru rpiswu2 kprball
ktf_scn_time
ktmmon ktmSmonMain ksbrdp opirip opidrv sou2o opimai_real main main_opd_entry

SMON 還可能使用以下SQL語句維���SMON_SCN_TIME字典基表:

select smontabv.cnt,
       smontab.time_mp,
       smontab.scn,
       smontab.num_mappings,
       smontab.tim_scn_map,
       smontab.orig_thread
  from smon_scn_time smontab,
       (select max(scn) scnmax,
               count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt
          from smon_scn_time
         where thread = 0) smontabv
 where smontab.scn = smontabv.scnmax
   and thread = 0
insert into smon_scn_time
  (thread,
   time_mp,
   time_dp,
   scn,
   scn_wrp,
   scn_bas,
   num_mappings,
   tim_scn_map)
values
  (0, :1, :2, :3, :4, :5, :6, :7)
update smon_scn_time
   set orig_thread  = 0,
       time_mp      = :1,
       time_dp      = :2,
       scn          = :3,
       scn_wrp      = :4,
       scn_bas      = :5,
       num_mappings = :6,
       tim_scn_map  = :7
 where thread = 0
   and scn = (select min(scn) from smon_scn_time where thread = 0)
delete from smon_scn_time
 where thread = 0
   and scn = (select min(scn) from smon_scn_time where thread = 0)

如何禁止SMON更新SMON_SCN_TIME基表

可以通過設置診斷事件event=’12500 trace name context forever, level 10′來禁止SMON更新SMON_SCN_TIME基表(Setting the 12500 event at system level should stop SMON from updating the SMON_SCN_TIME table.):

SQL>  alter system set events '12500 trace name context forever, level 10';
System altered.

一般我們不推薦禁止SMON更新SMON_SCN_TIME基表,因爲這樣會影響flashback Query閃回查詢的正常使用,但是在某些異常恢復的場景中SMON_SCN_TIME數據訛誤可能導致實例的Crash,那麼可以利用以上12500事件做到不觸發SMON_SCN_TIME被更新。

如何手動清除SMON_SCN_TIME的數據

因爲SMON_SCN_TIME不是bootstrap自舉核心對象,所以我們可以手動更新該表上的數據、及重建其索引。

如我在<ORA-00600[6711]錯誤一例>中介紹了因爲SMON_SCN_TIME與其索引的數據不一致時,可以通過重建索引來解決問題:

connect / as sysdba
drop index smon_scn_time_scn_idx;
drop index smon_scn_time_tim_idx;
create unique index smon_scn_time_scn_idx on smon_scn_time(scn);
create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp);
analyze table smon_scn_time validate structure cascade;

可以在設置了12500事件後手動刪除SMON_SCN_TIME上的記錄,重啓實例後SMON會繼續正常更新SMON_SCN_TIME。除非是因爲SMON_SCN_TIME表上的記錄與索引smon_scn_time_tim_idxsmon_scn_time_scn_idx上的不一致造成DELETE語句無法有效刪除該表上的記錄:文檔<LOCK ON SYS.SMON_SCN_TIME [ID 747745.1]>說明了該問題,否則我們沒有必要手動去清除SMON_SCN_TIME上的數據。

具體方法如下:

SQL> conn / as sysdba
/* Set the event at system level */
SQL> alter system set events '12500 trace name context forever, level 10';
/* Delete the records from SMON_SCN_TIME */
SQL> delete from smon_scn_time;
SQL> commit;
SQL> alter system set events '12500 trace name context off';
完成以上步驟後重啓實例restart instance
shutdown immediate;
startup;

瞭解你所不知道的SMON功能():OFFLINE UNDO SEGMENT

SMON這個老牌的後臺關鍵進程的作用還包括對UNDO/ROLLBACK SEGMENT的維護, 這種維護主要體現在2個方面: OFFLINESHRINK  UNDO/ROLLBACK SEGMENT, 今天我們主要介紹OFFLINE ROLLBACK SEGMENT

你肯定要問,Oracle爲什麼OFFLINE UNDO/ROLLBACK SEGMENT?

最主要的目的是減輕高併發事務環境中對UDNO SPACE撤銷空間使用的壓力。

觸發場景

10g之前的9i中每12個小時SMON會根據V$UNDOSTAT中記錄來決定在現有基礎上要OFFLINE多少個UNDO SEGMENT,又要保留多少個UNDO SEGMENT; 在9i中被OFFLINED UNDO SEGMENT 還會被SMON DROP掉,以進一步回收空間。

具體保留多少個UNDO SEGMENT,取決於過去12個小時內的V$UNDOSTAT動態視圖記錄的最大併發事務數量在加上1,具體公式可以參考下面的SQL:

SQL> select max(MAXCONCURRENCY)+1 from v$undostat where begin_time> (sysdate-1/2);
MAX(MAXCONCURRENCY)+1
---------------------
4

若你在alert.log中發現類似以下的信息則說明OFFLINE UNDO SEGS已經在你的系統中發生過了:

SMON offlining US=13
Freeing IMU pool for usn 13
SMON offlining US=14
SMON offlining US=15
SMON offlining US=16
SMON offlining US=17

9iSMON通過ktusmofd函數實現對UDNO SEGMENTOFFLINEktusmofd的含義爲[K]ernel [T]ransaction [U]ndo [S]ystem [M]anaged OFFLINE & DROP
通過ktsmgfru函數返回必要保留的ONLINE UNDO SEGMENT, 其詳細的算法如下:

SMON調用ktusmofd ,並發現instance啓動未超過12個小時並且_smu_debug_mode未設置KTU_DEBUG_SMU_SMON_SHRINK標誌位
(_smu_debug_modeSYSTEM MANAGED UNDO內部參數,KTU_DEBUG_SMU_SMON_SHRINK標誌位控制是否強制SMONSHRINK)
          YES  -  SMONOFFLINE任何東西直接返回
                   NO   -  調用ktsmgfru 獲得過去12小時的最大併發事務數
                           設置keep_online變量爲ktsmgfru 返回值加上1
                                     嘗試hold TA ENQUEUE(該隊列鎖控制UNDO TABLESPACE的串行操作),該操作的超時限制爲30s
                                        若無法獲得該ENQUEUE則說���正在切換UNDO TABLESPACEktusmofd將直接返回且不OFFLINE任何UNDO SEGMENTS
                                        成功獲得該ENQUEUE鎖,進一步調用ktusmofxu並使用之前獲得的keep_online作爲參數,開始OFFLINE
                                                調用kslgpl函數獲得KTU LATCH 包括parent和所有的children
                                                    LOOP 在現有的ONLINE UNDO SEGMENT之間循環
                                                      若發現該UNDO SEGMENTSMU-SYSTEM MANAGED UNDO且其所在表空間是當前undo_tablespace指向的表空間的話
                                                        keep_online >0  keep_online--
                                                        否則
                                                    釋放KTU latches
                                                    調用kturof1函數實際OFFLINE UNDO SEGMENT
                                                    重新get KTU latches
                                                END LOOP
                                           釋放 KTU latches

SMON 調用ktusmofd維護OFFLINE UNDO SEGMENT的常見STACK CALL如下:

ktmmon->ktusmofd->ktusmdxu->ktcrcm->ktccpcmt->ktcccdel->ktadrpc->ktssdro_segment->
ktssdrbm_segment->ktsxbmdelext->kqrcmt->ktsscu
xctrol ktcpoptx ktccpcmt ktcrcm ktusmdxu ktusmofd ktmmon
ksedmp ksfdmp kgeasnmierr ktusmgmct ktusmdxu ktusmofd ktmmon ksbrdp opirip 
opidrv sou2o main

10g以前的UNDO OFFLINE算法仍不完善,這導致在實例重啓或切換UNDO TABLESPACE撤銷表空間時,生成一定數量ONLINE UNDO SEGMENT的系統預熱時間可能長達幾分鐘,對於高併發的環境來說這種延時是難以接受的。

10g開始改進了SMON OFFLINE UNDO SEGMENT的算法,SMON會基於過去7天的(而非12個小時的)V$UNDOSTAT動態視圖信息或者AWR自動負載倉庫中的UNDO歷史快照使用信息來決定OFFLINE UNDO SEGMENT的數量,  且在10gSMON 不再DROP掉多餘的UNDO SEGS,而僅僅OFFLINE掉;作爲一種SMU的改良算法這種做法被叫做”Fast Ramp-Up”。”Fast Ramp-Up”避免了早期版本中由SMON維護UNDO SEGS引起的等待或性能問題此外,未公開的BUG 5079978可能在版本10.2.0.1中被觸發,該BUG的信息如下:

Unpublished
Bug 5079978 ��� APPST GSI 10G : – PRODUCTION INSTANCE UNUSABLE DUE TO US ENQUEUE WAITS
is fixed in 11.1 and patch set 10.2.0.4 and interim patches are available for several earlier versions.
Please refer to Note 5079978.8

可以通過後面要介紹的 10511 event來規避以上bugOracle官方也推薦在10g以前的版本中使用 10511 event來避免SMON過度OFFLINE UNDO SEGS所引起的問題。

10g以後的具體算法如下:

判斷實例啓動是否超過7天?
             YES -  直接使用v$undostat中過去7天的最大併發事務數max(maxconcurrency)
                           NO  -  判斷是否是第一次調用OFFLINE UNDO SEGMENT的內核函數
                                  YES - 檢查是否存在select_workload_repository function (SWRF)快照數據
                                                  NO  - ONLINE 最小數目的UNDO SEGMENTS
                                                  YES - 嘗試獲取AWR記錄表wrh$_undostat中過去7天的最大併發事務數max(maxconcurrency)
                                                        若無法獲得以上值,則嘗試讀取wrh$_rollstat中最近7天的最大rollback segs數量max(rbs cnt)
                                              將返回值保存到內部變量中
                              NO -  直接使用內部變量中的值

如何禁止SMON OFFLINE UNDO SEGMENT?

可以通過設置診斷事件event=’10511 trace name context forever, level 1′ 來禁用SMON OFFLINE UNDO SEGS;   但是10511事件不會跳過”Fast Ramp Up”,而僅會限制SMONUNDO SEGS產生的工作負載。 一旦設置了10511 event, 則所有已生成的 UNDO SEGS會始終保持ONLINE狀態。

具體的設置方法:

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
[oracle@vrh8 ~]$ oerr ora 10511
10511, 00000, "turn off SMON check to cleanup undo dictionary"
// *Cause:
// *Action:
SQL> alter system set events '10511 trace name context forever,level 1';
System altered.

OFFLINE UNDO SEGS的相關BUG

以下列出了SMON OFFLINE UNDO SEGS的一些公開的BUG,這些BUG一般都存在於10.2.0.3之前;若你真的遇到了,可以在考慮升級之餘 採用10511 event workaround規避該問題:

Hdr: 2726601 9.2.0.2 RDBMS 9.2.0.2 TXN MGMT LOCAL PRODID-5 PORTID-46 ORA-600 3439552
Abstract: ORA-600 [4406] IN ROUTINE KTCRAB(); 4 NODE RAC CLUSTER

Hdr: 6878461 9.2.0.4.0 RDBMS 9.2.0.4.0 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-601 5079978
Abstract: ESSC: ORA-601 ORA-474 AFTER OFFLINING UNDO SEGMENTS

Hdr: 4253991 9.2.0.4.0 RDBMS 9.2.0.4.0 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-600 2660394
Abstract: ORA-600 [KTSXR_ADD-4] FOLLOWED BY ORA-600 [KTSISEGINFO1]

Hdr: 2696314 9.2.0.2.0 RDBMS 9.2.0.2.0 TXN MGMT LOCAL PRODID-5 PORTID-46
Abstract: RECEIVING ORA-600: [KTUSMGMCT-01] AFTER APPLYING 92020 PATCH SET

Hdr: 3578807 9.2.0.4 RDBMS 9.2.0.4 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-600
Abstract: OERI 4042 RAISED INTERMITTENTLY
Hdr: 2727303 9.2.0.1.0 RDBMS 9.2.0.1.0 TXN MGMT LOCAL PRODID-5 PORTID-100 ORA-600
Abstract: [RAC] ORA-600: [KTUSMGMCT-01] ARE OCCURED IN HIGH LOAD

瞭解你所不知道的SMON功能():Shrink UNDO(rollback) SEGMENT

SMON對於Undo(Rollback)segment的日常管理還不止於OFFLINE UNDO SEGMENT ,在AUM(automatic undo management或稱SMU)模式下SMON還定期地收縮Shrink Rollback/undo segment

觸發場景

這種AUMrollback/undo segmentundo extentsshrink的現象可能被多種條件觸發:

§ 當另一個回滾段的transaction table急需undo空間時

§ SMON定期執行undo/rollback管理時(12個小時一次):

§ SMON會從空閒的undo segment中回收undo space,以便保證其他tranaction table需要空間時可用。另一個好處是undo datafile的身材不會急速膨脹導致用戶要去resize

§ 當處於undo space空間壓力時,特別是在發生UNDO STEAL的條件下; SGA中會記錄前臺進程因爲undo space壓力而做的undo steal的次數(v$undostat UNXPSTEALCNT EXPSTEALCNT);若這種UNDO STEAL的次數超過特定的閥值,則SMON會嘗試shrink transaction table

smon shrink rollback/undo真的發生時,會這樣處理:

計算平均的undo retention大小,按照下列公式:

retention size=(undo_retention * undo_rate)/(#online_transaction_table_segment 在線回滾段的個數)

對於每一個undo segment

§ 若是offlineundo segment,則回收其所有的已過期expired undo extents,保持最小2extents的空間

§ 若是onlineundo segment,則回收其所有的已過期expired undo extents,但是保持其segment所佔空間不小於平均retention對應的大小。

注意SMON的定期Shrink,每12個小時才發生一次,具體發生時可以參考SMON進程的TRACE

若系統中存在大事務,則rollback/undo segment可能擴展到很大的尺寸;視乎事務的大小,則undo tablespace上的undo/rollback segment會呈現出不規則的空間佔用分佈。

SMON的定期清理undo/rollback segment就是要像一個大錘敲擊鋼鐵那樣,把這些大小不規則的online segment清理成大小統一的回滾段,以便今後使用。

當然這種定期的shrink也可能造成一些阻礙,畢竟在shrink過程中會將undo segment header鎖住,則事務極低概率可能遇到ORA-1551錯誤:

[oracle@vmac1 ~]$ oerr ora 1551
01551, 00000, "extended rollback segment, pinned blocks released"
// *Cause: Doing recursive extent of rollback segment, trapped internally
//        by the system
// *Action: None

如何禁止SMON SHRINK UNDO SEGMENT?

可以通過設置診斷事件event=’10512 trace name context forever, level 1′來禁用SMON OFFLINE UNDO SEGS;

SQL> select * from global_name;
GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmaclean.com
SQL> alter system set events '10512 trace name context forever,level 1';
System altered.

相關BUG

這些BUG主要集中在9.2.0.8之前,10.2.0.3以後幾乎絕跡了:

Bug 1955307 – SMON may self-deadlock (ORA-60) shrinking a rollback segment in SMU mode [ID 1955307.8]
Bug 3476871 : SMON ORA-60 ORA-474 ORA-601 AND DATABASE CRASHED
Bug 5902053 : SMON WAITING ON ‘UNDO SEGMENT TX SLOT’ HANGS DATABASE
Bug 6084112 : INSTANCE SLOW SHOW SEVERAL LONGTIME RUNNING WAIT EVENTS

瞭解你所不知道的SMON功能(十一):Transaction Recover

SMON的作用還包括啓動(startup)時的Transaction Recover:

SMON: enabling cache recovery
Archived Log entry 87 added for thread 1 sequence 58 ID 0xa044e7d dest 1:
[15190] Successfully onlined Undo Tablespace 2.
Undo initialization finished serial:0 start:421305354 end:421305534 diff:180 (1 seconds)
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery

<瞭解你所不知道的SMON功能():Recover Dead transaction>中我們介紹了SMON清理死事務的功能,數據庫打開時由SMON所啓動的TX recoveryRecover Dead transaction所作的工作是類似的,fast_start_parallel_rollback參數決定了SMON在回滾事務時使用的並行度(詳見原帖)

但是請注意,實際startup時的TX recovery要比普通的Dead transaction recover複雜的多。其大致步驟如下:

1.SYSTEM回滾段(Undo Segment Numbero)中的Active Transaction將被第一時間優先回滾

2.在其他回滾段中的Active Transaction將被標記爲DEAD’

3.之後SMON將掃描非SYSTEM的回滾段並實施對死事務的回滾,其典型的調用堆棧stack call如下:

 kturec <- kturax <- ktprbeg <- ktmmon <- ktmSmonMain

4.SMON仍將掃描_OFFLINE_ROLLBACK_SEGMENTS所列出的回滾段,但對其上的Active Transaction不做回滾,若發現corrupted則只報錯

5.SMON將忽略_CORRUPTED_ROLLBACK_SEGMENTS所列出的回滾段,甚至在啓動時不做掃描,所有指向這類回滾段地事務都被認爲已經提交了。

具體SMON在對ktuini的函數調用中啓動Transaction Recover,function的經典stack call如下:

adbdrv -> ktuini -> ktuiup -> kturec -> kturrt
or
adbdrv -> ktuini -> ktuiof -> ktunti -> kqrpre -> kqrpre1 -> ktuscr

其中由ktuiof函數判斷_OFFLINE_ROLLBACK_SEGMENTS_CORRUPTED_ROLLBACK_SEGMENTS的值,並將這些重要的回滾段信息轉存到fixed array
注意SYSTEM回滾段是bootstrap的重要對象,所以我們不能指定system rollback segmentoffline或者corrupted

SMON執行Transaction Recover時的大致步驟如下:

調用ktuiof保存_OFFLINE_ROLLBACK_SEGMENTS_CORRUPTED_ROLLBACK_SEGMENTS所列出的回滾段

調用ktuiup函數,開始恢復回滾段上的死事務

第一優先級地恢復USN=0SYSTEM回滾段上的事務,由kturec函數控制

undo$字典基表上的記錄循環:

FOR usn in undo$ loop
IF usn==0

恢復SYSTEM回滾段上在第一輪中未完成的事務,同樣由kturec控制;

ELSE

將任何活動事務標記爲DEAD,由kturec控制;

USN++

end loop

相關診斷事件

Transaction Recover密切相關的診斷事件有不少,其中最爲重要的是event 100131001510015事件對於普通的dead transaction rollback也有效,之所以把該事件列在<Transaction Recover>功能內,是因爲我們經常在非正常手段打開數據庫時會遇到一些ORA-600[4xxx]的內部錯誤,可以通過10015事件瞭解相關的usn,然後以_SYSSMU(USN#)$的形式加入到_CORRUPTED_ROLLBACK_SEGMENTS以繞過內部錯誤(注意在11g中不能這樣做了)

1. 10013, 00000, “Instance Recovery”

2. 10015, 00000, “Undo Segment Recovery”

Event 10013:

Monitor transaction recovery during startup
SQL> alter system set event='10013 trace name context forever,level 10' scope=spfile;
Event 10015:

Dump undo segment headers before and after transaction recovery
SQL> alter system set event='10015 trace name context forever,level 10' scope=spfile;
System altered.
======================10015 sample trace===========================
UNDO SEG (BEFORE RECOVERY): usn = 0  Extent Control Header
  -----------------------------------------------------------------
  Extent Header:: spare1: 0      spare2: 0      #extents: 6      #blocks: 47
                  last map  0x00000000  #maps: 0      offset: 4128
      Highwater::  0x0040000b  ext#: 0      blk#: 1      ext size: 7
  #blocks in seg. hdr's freelists: 0
  #blocks below: 0
  mapblk  0x00000000  offset: 0
                   Unlocked
     Map Header:: next  0x00000000  #extents: 6    obj#: 0      flag: 0x40000000
  Extent Map
  -----------------------------------------------------------------
   0x0040000a  length: 7
   0x00400011  length: 8
   0x00400181  length: 8
   0x00400189  length: 8
   0x00400191  length: 8
   0x00400199  length: 8      
  TRN CTL:: seq: 0x012c chd: 0x0033 ctl: 0x0026 inc: 0x00000000 nfb: 0x0001
            mgc: 0x8002 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
            uba: 0x0040000b.012c.1b scn: 0x0000.021fa595
Version: 0x01
  FREE BLOCK POOL::
    uba: 0x0040000b.012c.1b ext: 0x0  spc: 0x4a0
    uba: 0x00000000.005c.07 ext: 0x2  spc: 0x1adc
    uba: 0x00000000.0034.37 ext: 0x4  spc: 0x550
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0
    uba: 0x00000000.0000.00 ext: 0x0  spc: 0x0     
  TRN TBL::
  index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num
  ------------------------------------------------------------------------------------------------
   0x00    9    0x00  0x025d  0x002b  0x0000.02215c0b  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x01    9    0x00  0x025d  0x0006  0x0000.0220a58c  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x02    9    0x00  0x025d  0x000e  0x0000.0220a58a  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x03    9    0x00  0x025d  0x000f  0x0000.02215be4  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x04    9    0x00  0x025d  0x0008  0x0000.0220a57a  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x05    9    0x00  0x025d  0x0056  0x0000.0220a583  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x06    9    0x00  0x025d  0x0017  0x0000.0220a58d  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x07    9    0x00  0x025d  0x0050  0x0000.0220a57f  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x08    9    0x00  0x025d  0x0061  0x0000.0220a57c  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x09    9    0x00  0x025d  0x0013  0x0000.02215c01  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x0a    9    0x00  0x025d  0x0022  0x0000.02215bf7  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x0b    9    0x00  0x025d  0x0014  0x0000.02215bdd  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x0c    9    0x00  0x025c  0x003a  0x0000.021ff3fa  0x004001a0  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x0d    9    0x00  0x025d  0x0010  0x0000.02215c05  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x0e    9    0x00  0x025d  0x0001  0x0000.0220a58b  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x0f    9    0x00  0x025d  0x001c  0x0000.02215be6  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x10    9    0x00  0x025d  0x002a  0x0000.02215c07  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x11    9    0x00  0x025d  0x0025  0x0000.02215bf2  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x12    9    0x00  0x025d  0x0018  0x0000.02215bee  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x13    9    0x00  0x025d  0x000d  0x0000.02215c03  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x14    9    0x00  0x025d  0x005a  0x0000.02215bdf  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x15    9    0x00  0x025d  0x0058  0x0000.0220a587  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x16    9    0x00  0x025d  0x000a  0x0000.02215bf6  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x17    9    0x00  0x025d  0x000b  0x0000.0220a58e  0x0040000a  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x18    9    0x00  0x025d  0x0011  0x0000.02215bf0  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x19    9    0x00  0x025c  0x0044  0x0000.021ff410  0x004001a0  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x1a    9    0x00  0x025d  0x005c  0x0000.02215bea  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x1b    9    0x00  0x025d  0x001d  0x0000.02215bfd  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x1c    9    0x00  0x025d  0x001a  0x0000.02215be8  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x1d    9    0x00  0x025d  0x0009  0x0000.02215bff  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x1e    9    0x00  0x025d  0x005f  0x0000.02215bfa  0x0040000b  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x1f    9    0x00  0x025c  0x0032  0x0000.021fa59b  0x0040019f  0x0000.000.00000000  0x00000001   0x00000000   0x0000
   0x20    9    0x00  0x025c  0x0038  0x0000.021fa599  0x0040019f  0x0000.000.00000000  0x00000001   0x00000000   0x0000
可用以下命令分析smon10015 trace,並列出相關回滾段名
[oracle@rh2 bdump]$ cat g10r2_smon_18738.trc|grep "usn ="|grep -v "usn = 0" |awk '{print "_SYSSMU"$7"$"}'|sort -u
_SYSSMU1$
_SYSSMU10$
_SYSSMU2$
_SYSSMU3$
_SYSSMU4$
_SYSSMU5$
_SYSSMU6$
_SYSSMU7$
_SYSSMU8$
_SYSSMU9$

瞭解你所不知道的SMON功能(十二):Instance Recovery

SMON的作用還包括RAC環境中的Instance Recovery,注意雖然Instance Recovery可以翻做實例恢復,但實際上和我們口頭所說的實例恢復是不同的。我們口頭語言所說的實例恢復很大程度上是指Crash Recovery崩潰恢復,Instance RecoveryCrash Recovery是存在區別的:針對單實例(single instance)或者RAC中所有節點全部崩潰後的恢復,我們稱之爲Crash Recovery。而對於RAC中的某一個節點失敗,存活節點(surviving instance)試圖對失敗節點線程上redo做應用的情況,我們稱之爲Instance Recovery。對於Crash Recovery更多的內容可見<還原真實的cache recovery>一文。

現象

Instance Recovery期間分別存在cache recoveryges/gcs remaster2recovery stage,注意這2個舞臺的恢復是同時進行的。cache recovery的主角是存活節點上的SMON進程,SMON負責分發redoslave進程。而實施ges/gcs remaster的是RAC專有進程LMON

整個Reconfiuration的過程如下圖:


注意以上Crash Detected時數據庫進入部分可用(Partial Availability)狀態,從Freeze Lockdb開始None Availability,到IR applies redo即前滾時轉換爲Partial Availability,待前滾完成後會實施回滾,但是此時數據庫已經進入完全可用(Full Availability)狀態了,如下圖:

The graphic illustrates the degree of database availability during each step of Oracle instance recovery:

A.         Real Application Clusters is running on multiple nodes.

B.         Node failure is detected.

C.         The enqueue part of the GRD is reconfigured; resource management is redistributed to the surviving nodes. This operation occurs relatively quickly.

D.        The cache part of the GRD is reconfigured and SMON reads the redo log of the failed instance to identify the database blocks that it needs to recover.

E.         SMON issues the GRD requests to obtain all the database blocks it needs for recovery. After the requests are complete, all other blocks are accessible.

F.         The Oracle server performs roll forward recovery. Redo logs of the failed threads are applied to the database, and blocks are available right after their recovery is completed.

G.        The Oracle server performs rollback recovery. Undo blocks are applied to the database for all uncommitted transactions.

H.        Instance recovery is complete and all data is accessible.

Note: The dashed line represents the blocks identified in step 2 in the previous slide. Also, the dotted steps represent the ones identified in the previous slide.

我們來實際觀察一下Instance Recovery的過程:

INST 1:

SQL> select * from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

PL/SQL Release 11.2.0.2.0 - Production

CORE    11.2.0.2.0      Production

TNS for Linux: Version 11.2.0.2.0 - Production

NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME

--------------------------------------------------------------------------------

www.oracledatabase12g.com

SQL> alter system set event='10426 trace name context forever,level 12' scope=spfile;  -- 10426 event Reconfiguration trace event

System altered.

SQL> startup force;

ORACLE instance started.

INST 2:

SQL> shutdown abort

ORACLE instance shut down.

=============================================================

========================alert.log============================

Reconfiguration started (old inc 4, new inc 6)

List of instances:

1 (myinst: 1)

Global Resource Directory frozen

* dead instance detected - domain 0 invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Instance recovery: looking for dead threads

Beginning instance recovery of 1 threads

parallel recovery started with 2 processes                 --2 recovery slave

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Started redo scan

Completed redo scan

read 88 KB redo, 82 data blocks need recovery

Started redo application at

Thread 2: logseq 374, block 2, scn 54624376

Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0

Mem# 0: +DATA/prod/onlinelog/group_4.271.747100549

Mem# 1: +DATA/prod/onlinelog/group_4.272.747100553

Completed redo application of 0.07MB

Completed instance recovery at

Thread 2: logseq 374, block 178, scn 54646382

73 data blocks read, 83 data blocks written, 88 redo k-bytes read

Thread 2 advanced to log sequence 375 (thread recovery)

Redo thread 2 internally disabled at seq 375 (SMON)

ARC3: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (thread 2 sequence 374) (PROD1)

Setting Resource Manager plan SCHEDULER[0x310B]:DEFAULT_MAINTENANCE_PLAN via scheduler window

Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter

ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (PROD1)

2011-06-27 22:19:29.280000 +08:00

Archived Log entry 792 added for thread 2 sequence 374 ID 0x9790ab2 dest 1:

ARC0: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (thread 2 sequence 375) (PROD1)

2011-06-27 22:19:30.336000 +08:00

ARC0: Archiving disabled thread 2 sequence 375

ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (PROD1)

Archived Log entry 793 added for thread 2 sequence 375 ID 0x9790ab2 dest 1:

minact-scn: Master considers inst:2 dead

==================================================================

===========================smon trace begin=======================

*** 2011-06-27 22:19:28.279

2011-06-27 22:19:28.279849 : Start recovery for domain=0, valid=0, flags=0x0

Successfully allocated 2 recovery slaves

Using 67 overflow buffers per recovery slave

Thread 2 checkpoint: logseq 374, block 2, scn 54624376

cache-low rba: logseq 374, block 2

on-disk rba: logseq 374, block 178, scn 54626382

start recovery at logseq 374, block 2, scn 54624376

Instance recovery not required for thread 1

*** 2011-06-27 22:19:28.487

Started writing zeroblks thread 2 seq 374 blocks 178-185

*** 2011-06-27 22:19:28.487

Completed writing zeroblks thread 2 seq 374

==== Redo read statistics for thread 2 ====

Total physical reads (from disk and memory): 4096Kb

-- Redo read_disk statistics --

Read rate (ASYNC): 88Kb in 0.18s => 0.48 Mb/sec

Longest record: 8Kb, moves: 0/186 (0%)

Longest LWN: 33Kb, moves: 0/47 (0%), moved: 0Mb

Last redo scn: 0x0000.0341884d (54626381)

----------------------------------------------

----- Recovery Hash Table Statistics ---------

Hash table buckets = 262144

Longest hash chain = 1

Average hash chain = 82/82 = 1.0

Max compares per lookup = 1

Avg compares per lookup = 248/330 = 0.8

----------------------------------------------

*** 2011-06-27 22:19:28.489

KCRA: start recovery claims for 82 data blocks

*** 2011-06-27 22:19:28.526

KCRA: blocks processed = 82/82, claimed = 81, eliminated = 1

2011-06-27 22:19:28.526088 : Validate domain 0

**************** BEGIN RECOVERY HA STATS  ****************

I'm the recovery instance

smon posted (1278500359646), recovery started 0.027 secs,(1278500359673)

domain validated 0.242 secs (1278500359888)

claims opened 70, claims converted 11, claims preread 0

****************  END RECOVERY HA STATS  *****************

2011-06-27 22:19:28.526668 : Validated domain 0, flags = 0x0

*** 2011-06-27 22:19:28.556

Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0

*** 2011-06-27 22:19:28.560

Completed redo application of 0.07MB

*** 2011-06-27 22:19:28.569

Completed recovery checkpoint

----- Recovery Hash Table Statistics ---------

Hash table buckets = 262144

Longest hash chain = 1

Average hash chain = 82/82 = 1.0

Max compares per lookup = 1

Avg compares per lookup = 330/330 = 1.0

----------------------------------------------

*** 2011-06-27 22:19:28.572 5401 krsg.c

Acquiring RECOVERY INFO PING latch from [krsg.c:5401] IX0

*** 2011-06-27 22:19:28.572 5401 krsg.c

Successfully acquired RECOVERY INFO PING latch IX+

*** 2011-06-27 22:19:28.572 5406 krsg.c

Freeing RECOVERY INFO PING latch from [krsg.c:5406] IX0

*** 2011-06-27 22:19:28.572 5406 krsg.c

Successfully freed RECOVERY INFO PING latch IX-

krss_sched_work: Prod archiver request from process SMON (function:0x2000)

krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)

krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)

krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)

krss_find_arc: Evaluating ARC2 to receive message (flags 0x2)

krss_find_arc: Selecting ARC2 to receive REC PING message

*** 2011-06-27 22:19:28.572 3093 krsv.c

krsv_send_msg: Sending message to process ARC2

*** 2011-06-27 22:19:28.572 1819 krss.c

krss_send_arc: Sent message to ARC2 (message:0x2000)

Recovery sets nab of thread 2 seq 374 to 178 with 8 zeroblks

Retrieving log 4

pre-aal: xlno:4 flno:0 arf:0 arb:2 arh:2 art:4

Updating log 3 thread 2 sequence 375

Previous log 3 thread 2 sequence 0

Updating log 4 thread 2 sequence 374

Previous log 4 thread 2 sequence 374

post-aal: xlno:4 flno:0 arf:3 arb:2 arh:2 art:3

krss_sched_work: Prod archiver request from process SMON (function:0x1)

krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)

krss_find_arc: Selecting ARC3 to receive message

*** 2011-06-27 22:19:28.589 3093 krsv.c

krsv_send_msg: Sending message to process ARC3

*** 2011-06-27 22:19:28.589 1819 krss.c

krss_send_arc: Sent message to ARC3 (message:0x1)

Retrieving log 2

Kicking thread 1 to switch logfile

Retrieving log 4

Retrieving log 3

krss_sched_work: Prod archiver request from process SMON (function:0x1)

krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)

krss_find_arc: Selecting ARC0 to receive message

*** 2011-06-27 22:19:28.599 3093 krsv.c

krsv_send_msg: Sending message to process ARC0

*** 2011-06-27 22:19:28.599 1819 krss.c

krss_send_arc: Sent message to ARC0 (message:0x1)

*** 2011-06-27 22:19:28.599 838 krsv.c

krsv_dpga: Waiting for pending I/O to complete

*** 2011-06-27 22:19:29.304

krss_sched_work: Prod archiver request from process SMON (function:0x1)

krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)

krss_find_arc: Selecting ARC1 to receive message

*** 2011-06-27 22:19:29.304 3093 krsv.c

krsv_send_msg: Sending message to process ARC1

*** 2011-06-27 22:19:29.304 1819 krss.c

krss_send_arc: Sent message to ARC1 (message:0x1)

SMON[INST-TXN-RECO]:about to recover undo segment 11 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 11 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 12 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 12 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 13 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 13 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 14 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 14 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 15 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 15 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 16 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 16 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 17 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 17 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 18 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 18 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 19 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 19 as available status:2 ret:0

SMON[INST-TXN-RECO]:about to recover undo segment 20 status:3 inst:2

SMON[INST-TXN-RECO]: mark undo segment 20 as available status:2 ret:0

*** 2011-06-27 22:19:43.299

* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0

* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:

single window RM request queue is empty

multi-window RM request queue is empty

* Global DRM state ---:

There is no dynamic remastering

RM lock state = 0

pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1

flg x0 type x0 afftime x36e6e3a8

nreplays by lms 0 = 0

* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue

* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0

* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:22:18.333

* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0

* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:

single window RM request queue is empty

multi-window RM request queue is empty

* Global DRM state ---:

There is no dynamic remastering

RM lock state = 0

pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1

flg x0 type x0 afftime x36e6e3a8

nreplays by lms 0 = 0

* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue

* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0

* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:24:53.365

* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0

* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:

single window RM request queue is empty

multi-window RM request queue is empty

* Global DRM state ---:

There is no dynamic remastering

RM lock state = 0

pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1

flg x0 type x0 afftime x36e6e3a8

nreplays by lms 0 = 0

* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue

* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0

* kju_tsn_aff_drm_pending TRACEUD: 2 return true

========================================================================

==============================lmon trace begin==========================

*** 2011-06-27 22:19:27.748

kjxgmpoll reconfig instance map: 1

*** 2011-06-27 22:19:27.748

kjxgmrcfg: Reconfiguration started, type 1

CGS/IMR TIMEOUTS:

CSS recovery timeout = 31 sec (Total CSS waittime = 65)

IMR Reconfig timeout = 75 sec

CGS rcfg timeout = 85 sec

kjxgmcs: Setting state to 4 0.

*** 2011-06-27 22:19:27.759

Name Service frozen

kjxgmcs: Setting state to 4 1.

kjxgrdecidever: No old version members in the cluster

kjxgrssvote: reconfig bitmap chksum 0x2137452d cnt 1 master 1 ret 0

kjxgrpropmsg: SSMEMI: inst 1 - no disk vote

kjxgrpropmsg: SSVOTE: Master indicates no Disk Voting

2011-06-27 22:19:27.760783 : kjxgrDiskVote: nonblocking method is chosen

kjxggpoll: change poll time to 50 ms

2011-06-27 22:19:27.918847 : kjxgrDiskVote: Obtained RR update lock for sequence 5, RR seq 4

2011-06-27 22:19:28.023160 : kjxgrDiskVote: derive membership from CSS (no disk votes)

2011-06-27 22:19:28.023240 : proposed membership: 1

*** 2011-06-27 22:19:28.081

2011-06-27 22:19:28.081952 : kjxgrDiskVote: new membership is updated by inst 1, seq 6

2011-06-27 22:19:28.082073 : kjxgrDiskVote: bitmap: 1

CGS/IMR TIMEOUTS:

CSS recovery timeout = 31 sec (Total CSS waittime = 65)

IMR Reconfig timeout = 75 sec

CGS rcfg timeout = 85 sec

kjxgmmeminfo: can not invalidate inst 2

kjxgmps: proposing substate 2

kjxgmcs: Setting state to 6 2.

kjfmSendAbortInstMsg: send an abort message to instance 2

kjfmuin: inst bitmap 1

kjfmmhi: received msg from inst 1 (inc 2)

Performed the unique instance identification check

kjxgmps: proposing substate 3

kjxgmcs: Setting state to 6 3.

Name Service recovery started

Deleted all dead-instance name entries

kjxgmps: proposing substate 4

kjxgmcs: Setting state to 6 4.

Multicasted all local name entries for publish

Replayed all pending requests

kjxgmps: proposing substate 5

kjxgmcs: Setting state to 6 5.

Name Service normal

Name Service recovery done

*** 2011-06-27 22:19:28.191

kjxgmps: proposing substate 6

kjxgmcs: Setting state to 6 6.

kjxgmcs: total reconfig time 0.432 seconds (from 2895072218 to 2895072650)

kjxggpoll: change poll time to 600 ms

kjfmact: call ksimdic on instance (2)

2011-06-27 22:19:28.211846 :

********* kjfcrfg() called, BEGIN LMON RCFG *********

2011-06-27 22:19:28.211906 : * Begin lmon rcfg step KJGA_RCFG_BEGIN

* kjfcrfg: Resource broadcasting disabled

* kjfcrfg: kjfcqiora returned success

kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)

2011-06-27 22:19:28.211954 :

Reconfiguration started (old inc 4, new inc 6)

TIMEOUTS:

Local health check timeout: 70 sec

Rcfg process freeze timeout: 70 sec

Remote health check timeout: 140 sec

Defer Queue timeout: 163 secs

CGS rcfg timeout: 85 sec

Synchronization timeout: 248 sec

DLM rcfg timeout: 744 sec

List of instances:

1 (myinst: 1)

Undo tsn affinity 1

OMF 0

2011-06-27 22:19:28.212394 : * Begin lmon rcfg step KJGA_RCFG_FREEZE

*** 2011-06-27 22:19:28.233

* published: inc 6, isnested 0, rora req 0,

rora start 0, rora invalid 0, (roram 32767), isrcvinst 1,

(rcvinst 1), isdbopen 1, drh 0, (myinst 1)

thread 1, isdbmounted 1, sid hash x0

* kjfcrfg: published bigns successfully

* Force-published at step 3

2011-06-27 22:19:28.233575 :  Global Resource Directory frozen

* roram 32767, rcvinst 1

* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0

* kjfcrfg: queried bigns successfully

inst 1

* kjfcrfg: single_instance_kjga = TRUE

asby init, 0/1/x2

asby returns, 0/1/x2/false

* Domain maps before reconfiguration:

*   DOMAIN 0 (valid 1): 1 2

* End of domain mappings

* dead instance detected - domain 0 invalid = TRUE

* Domain maps after recomputation:

*   DOMAIN 0 (valid 0): 1

* End of domain mappings

2011-06-27 22:19:28.235110 : * Begin lmon rcfg step KJGA_RCFG_COMM

2011-06-27 22:19:28.235242 : GSIPC:KSXPCB: msg 0xd8b84550 status 34, type 2, dest 2, rcvr 0

2011-06-27 22:19:28.235339 : GSIPC:KSXPCB: msg 0xd8b80180 status 34, type 2, dest 2, rcvr 1

Active Sendback Threshold = 50 %

Communication channels reestablished

2011-06-27 22:19:28.240076 : * Begin lmon rcfg step KJGA_RCFG_EXCHANGE

2011-06-27 22:19:28.240192 : * Begin lmon rcfg step KJGA_RCFG_ENQCLEANUP

Master broadcasted resource hash value bitmaps

2011-06-27 22:19:28.251474 :

Non-local Process blocks cleaned out

2011-06-27 22:19:28.251822 : * Begin lmon rcfg step KJGA_RCFG_CLEANUP

2011-06-27 22:19:28.265220 : * Begin lmon rcfg step KJGA_RCFG_TIMERQ

2011-06-27 22:19:28.265308 : * Begin lmon rcfg step KJGA_RCFG_DDQ

2011-06-27 22:19:28.265393 : * Begin lmon rcfg step KJGA_RCFG_SETMASTER

2011-06-27 22:19:28.271551 :

Set master node info

2011-06-27 22:19:28.271931 : * Begin lmon rcfg step KJGA_RCFG_ENQREPLAY

2011-06-27 22:19:28.275490 :  Submitted all remote-enqueue requests

2011-06-27 22:19:28.275596 : * Begin lmon rcfg step KJGA_RCFG_ENQDUBIOUS

Dwn-cvts replayed, VALBLKs dubious

2011-06-27 22:19:28.277223 : * Begin lmon rcfg step KJGA_RCFG_ENQGRANT

All grantable enqueues granted

2011-06-27 22:19:28.277992 : * Begin lmon rcfg step KJGA_RCFG_PCMREPLAY

2011-06-27 22:19:28.279234 :

2011-06-27 22:19:28.279255 :  Post SMON to start 1st pass IR               --SMON posted by LMON

2011-06-27 22:19:28.307890 :  Submitted all GCS cache requests             --IR acquires all gcs resource needed for recovery

2011-06-27 22:19:28.308038 : * Begin lmon rcfg step KJGA_RCFG_FIXWRITES

Post SMON to start 1st pass IR

Fix write in gcs resources

2011-06-27 22:19:28.313508 : * Begin lmon rcfg step KJGA_RCFG_END

2011-06-27 22:19:28.313720 :

2011-06-27 22:19:28.313733 :

Reconfiguration complete

*   domain 0 valid?: 0

* kjfcrfg: ask RMS0 to do pnp work

**************** BEGIN DLM RCFG HA STATS  ****************

Total dlm rcfg time (inc 6): 0.100 secs (1278500359581, 1278500359681)

Begin step .........: 0.001 secs (1278500359581, 1278500359582)

Freeze step ........: 0.020 secs (1278500359582, 1278500359602)

Remap step .........: 0.002 secs (1278500359602, 1278500359604)

Comm step ..........: 0.005 secs (1278500359604, 1278500359609)

Sync 1 step ........: 0.000 secs (0, 0)

Exchange step ......: 0.000 secs (1278500359609, 1278500359609)

Sync 2 step ........: 0.000 secs (0, 0)

Enqueue cleanup step: 0.011 secs (1278500359609, 1278500359620)

Sync pcm1 step .....: 0.000 secs (0, 0)

Cleanup step .......: 0.013 secs (1278500359620, 1278500359633)

Timerq step ........: 0.000 secs (1278500359633, 1278500359633)

Ddq step ...........: 0.000 secs (1278500359633, 1278500359633)

Set master step ....: 0.006 secs (1278500359633, 1278500359639)

Sync 3 step ........: 0.000 secs (0, 0)

Enqueue replay step : 0.004 secs (1278500359639, 1278500359643)

Sync 4 step ........: 0.000 secs (0, 0)

Enqueue dubious step: 0.001 secs (1278500359643, 1278500359644)

Sync 5 step ........: 0.000 secs (0, 0)

Enqueue grant step .: 0.001 secs (1278500359644, 1278500359645)

Sync 6 step ........: 0.000 secs (0, 0)

PCM replay step ....: 0.030 secs (1278500359645, 1278500359675)

Sync 7 step ........: 0.000 secs (0, 0)

Fixwrt replay step .: 0.003 secs (1278500359675, 1278500359678)

Sync 8 step ........: 0.000 secs (0, 0)

End step ...........: 0.001 secs (1278500359680, 1278500359681)

Number of replayed enqueues sent / received .......: 0 / 0

Number of replayed fusion locks sent / received ...: 0 / 0

Number of enqueues mastered before / after rcfg ...: 2217 / 2941

Number of fusion locks mastered before / after rcfg: 3120 / 5747

****************  END DLM RCFG HA STATS  *****************

*** 2011-06-27 22:19:36.589

kjxgfipccb: msg 0x0x7ff526139320, mbo 0x0x7ff526139310, type 19, ack 0, ref 0, stat 34

=====================================================================

============================lms trace begin==========================

*** 2011-06-27 22:38:54.663

2011-06-27 22:38:54.663764 :  0 GCS shadows cancelled, 0 closed, 0 Xw survived

2011-06-27 22:38:54.673539 :  5230 GCS resources traversed, 0 cancelled

2011-06-27 22:38:54.707671 :  9322 GCS shadows traversed, 0 replayed, 0 duplicates,

5183 not replayed, dissolve 0 timeout 0 RCFG(10) lms 0 finished replaying gcs resources

2011-06-27 22:38:54.709132 :  0 write requests issued in 384 GCS resources--check past image

0 PIs marked suspect, 0 flush PI msgs

2011-06-27 22:38:54.709520 :  0 write requests issued in 273 GCS resources

1 PIs marked suspect, 0 flush PI msgs

2011-06-27 22:38:54.709842 :  0 write requests issued in 281 GCS resources

0 PIs marked suspect, 0 flush PI msgs

2011-06-27 22:38:54.710159 :  0 write requests issued in 233 GCS resources

0 PIs marked suspect, 0 flush PI msgs

2011-06-27 22:38:54.710531 :  0 write requests issued in 350 GCS resources

lms 0 finished fixing gcs write protocol

Instance Recovery和普通的Crash Recovery最大的區別在於實例恢復過程中的GRD Frozen和對GES/GCS資源的Remaster,這部分工作主要由LMON進程完成,可以從以上trace中發現一些KJGA_RCFG_*形式的Reconfiguration步驟,它們的含義:

Reconfiguration Steps:

1.    KJGA_RCFG_BEGIN

LMON continuously polling for reconfiguration event. Once cgs reports a change in cluster membership,

LMON starts reconfiguration

2.    KJGA_RCFG_FREEZE

All processes acknowledges to the reconfiguration freeze before LMON continue

3.    KJGA_RCFG_REMAP

Updates new instance map (kjfchsu), re-distributes resource mastership. Invalidate recovery domains

if reconfiguration is caused by instance death.

4.    KJGA_RCFG_COMM

Reinitialize communication channel

5.    KJGA_RCFG_EXCHANGE

Exchange of master information of gcs, ges and file affinity master

6.    KJGA_RCFG_ENQCLEANUP

Delete remote dead gcs/ges locks. Cancel converting gcs requests.

7.    KJGA_RCFG_CLEANUP

Cleanup/remove ges resources

8.    KJGA_RCFG_TIMERQ

Restore relative timeout for enqueue locks on timeout queue

9.    KJGA_RCFG_DDQ

Clean out enqueue locks on deadlock queue

10.  KJGA_RCFG_SETMASTER

Update master info for each enqueue resources that need to be remastered.

11.  KJGA_RCFG_REPLAY

Replay enqueue locks

12.  KJGA_RCFG_ENQDUBIOUS

Invalidates ges resources without established value

13.  KJGA_RCFG_ENQGRANT

Grants all grantable ges lock requests

14.  KJGA_RCFG_REPLAY2

Enqueue reconfiguration complete. Post SMON to start instance recovery.  Starts replaying gcs resources.

15.  KJGA_RCFG_FIXWRITES2

Fix write state of gcs resources

16.  KJGA_RCFG_END

Unfreeze lock database

Instance Recovery相關的診斷事件
我們無法禁止Instance Recovery的發生,事實上一旦出現Instance Crash那麼Instance Recovery就是必須的。


Instance Recovery相關的診斷事件主要有1042629717:

10426 – Reconfiguration trace event

10425 – Enqueue operations

10432 – Fusion activity

10429 – IPC tracing

oerr ora 10425

10425, 00000, "enable global enqueue operations event trace"

// *Document: NO

// *Cause:

// *Action: Dump trace for global enqueue operations.

oerr ora 10426

10426, 00000, "enable ges/gcs reconfiguration event trace"

// *Document: NO

// *Cause:

// *Action: Dump trace for ges/gcs reconfiguration.

oerr ora 10430

10430, 00000, "enable ges/gcs dynamic remastering event trace"

// *Document: NO

// *Cause:

// *Action: Dump trace for ges/gcs dynamic remastering.

oerr ora 10401

10401, 00000, "turn on IPC (ksxp) debugging"

// *Cause:

// *Action: Enables debugging code for IPC service layer (ksxp)

oerr ora 10708

10708, 00000, "print out trace information from the RAC buffer cache"

// *Cause: N/A

// *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE.  THIS DOES NOT NEED TO BE

//          TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING.

oerr ora 29717

29717, 00000, "enable global resource directory freeze/unfreeze event trace"

// *Document: NO

// *Cause:

// *Action: Dump trace for global resource directory freeze/unfreeze.

diag RAC INSTANCE SHUTDOWN LMON

LMON will dump more informations to trace during reconfig and freeze.

event="10426 trace name context forever, level 8"

event="29717 trace name context forever, level 5"

or

event="10426 trace name context forever, level 12"

event="10430 trace name context forever, level 12"

event="10401 trace name context forever, level 8"

event="10046 trace name context forever, level 8"

event="10708 trace name context forever, level 15"

event="29717 trace name context forever, level 5"

see 29717  grd frozen trace

alter system set event='29717 trace name context forever, level 5' scope=spfile;

=========================================================================

============================lmon trace begin=============================

********* kjfcrfg() called, BEGIN LMON RCFG *********

2011-06-27 23:13:16.693089 : * Begin lmon rcfg step KJGA_RCFG_BEGIN

* kjfcrfg: Resource broadcasting disabled

* kjfcrfg: kjfcqiora returned success

kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)

2011-06-27 23:13:16.693219 :

Reconfiguration started (old inc 4, new inc 6)

TIMEOUTS:

Local health check timeout: 70 sec

Rcfg process freeze timeout: 70 sec

Remote health check timeout: 140 sec

Defer Queue timeout: 163 secs

CGS rcfg timeout: 85 sec

Synchronization timeout: 248 sec

DLM rcfg timeout: 744 sec

List of instances:

1 (myinst: 1)

Undo tsn affinity 1

OMF 0

[FDB][start]

2011-06-27 23:13:16.701320 : * Begin lmon rcfg step KJGA_RCFG_FREEZE

[FACK][18711 not frozen]          --fack means acknowledge in advance

[FACK][18713 not frozen]

[FACK][18719 not frozen]

[FACK][18721 not frozen]

[FACK][18723 not frozen]

[FACK][18729 not frozen]

[FACK][18739 not frozen]

[FACK][18743 not frozen]

[FACK][18745 not frozen]

[FACK][18747 not frozen]

[FACK][18749 not frozen]

[FACK][18751 not frozen]

[FACK][18753 not frozen]

[FACK][18755 not frozen]

[FACK][18757 not frozen]

[FACK][18759 not frozen]

[FACK][18763 not frozen]

[FACK][18765 not frozen]

[FACK][18767 not frozen]

[FACK][18769 not frozen]

[FACK][18771 not frozen]

[FACK][18775 not frozen]

[FACK][18777 not frozen]

[FACK][18816 not frozen]

[FACK][18812 not frozen]

[FACK][18818 not frozen]

[FACK][18820 not frozen]

[FACK][18824 not frozen]

[FACK][18826 not frozen]

[FACK][18830 not frozen]

[FACK][18835 not frozen]

[FACK][18842 not frozen]

[FACK][18860 not frozen]

[FACK][18865 not frozen]

[FACK][18881 not frozen]

[FACK][18883 not frozen]

[FACK][18909 not frozen]

*** 2011-06-27 23:13:16.724

* published: inc 6, isnested 0, rora req 0,

rora start 0, rora invalid 0, (roram 32767), isrcvinst 0,

(rcvinst 32767), isdbopen 1, drh 0, (myinst 1)

thread 1, isdbmounted 1, sid hash x0

* kjfcrfg: published bigns successfully

* Force-published at step 3

2011-06-27 23:13:16.724764 :  Global Resource Directory frozen

* kjfc_qry_bigns: noone has the rcvinst established yet, set it to the highest open instance = 1

* roram 32767, rcvinst 1

* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0

* kjfcrfg: queried bigns successfully

=====================================================================

==========================lmd0 trace begin===========================

*** 2011-06-27 23:13:16.700

[FFCLI][frozen]

[FUFCLI][normal]



SMON的功能並不止於此,詳細完整的功能列表:

  1. 實施local instance recovery
  2. 實施OPS/RAC instance recovery
  3. 服務於排序段sort segment申請
  4. 實施transaction recovery(rollback)
  5. 清理不再使用的臨時段temporary segments
  6. 清理已經被aged out的遊標所使用的臨時表temporary tables
  7. 清理dead instance的臨時表temporary tables
  8.  刪除OBJ$基表上不再存在的對象記錄
  9.  若index online rebuild失敗,則負責清理ind$和indpart$
  10. 合併extents
  11. 在適當的時機收縮 rollback segment
  12. 在適當的實際offline rollback segment
  13. 恢復crash/instance recovery因datafile不可用(eg. offline)而跳過的dead transaction
  14. 恢復前臺進程因爲crash而造成的dead transaction

SMON的控制事件event列表:

  1. event=’10061 trace name context forever, level 10′禁用SMON清理臨時段(disable SMON from cleaning temp segments)
  2. event=’10269 trace name context forever, level 10′來禁用SMON合併空閒區間(Don’t do coalesces of free space in SMON)
  3. event=’10052 trace name context forever’來禁止SMON清理obj$基表
  4. 設置隱藏參數_column_tracking_level(column usage tracking),該參數默認爲1即啓用column使用情況跟蹤。設置該參數爲0,將禁用column tracking
  5. events ’10513 trace name context forever, level 2′;設置10513事件來臨時禁止SMON恢復死事務,這在我們做某些異常恢復的時候顯得異常有效,當然不建議在一個正常的生產環境中設置這個事件
  6. event=’8105 trace name context forever’來禁止SMON清理IND$(Oracle event to turn off smon cleanup for online index build)
  7. events ’12500 trace name context forever, level 10′;可以在設置了12500事件後手動刪除SMON_SCN_TIME上的記錄,重啓實例後SMON會繼續正常更新SMON_SCN_TIME。
  8. event=’10511 trace name context forever, level 1′來禁用SMON OFFLINE UNDO SEGS; 但是10511事件不會跳過”Fast Ramp Up”,而僅會限制SMON對UNDO SEGS產生的工作負載。 一旦設置了10511 event, 則所有已生成的 UNDO SEGS會始終保持ONLINE狀態。
  9.  event=’10512 trace name context forever,level 1′ 禁用SMON shrink rollback segment
  10. event=’10510 trace name context forever,level 1′ 禁用檢測以便offline rollback

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章