最近兩天生產數據庫一直在報ORA-24756錯誤,查了MOS上的文檔,看到一篇類似的,說是BUG(Bug 19201866 - RECO reports ORA-24756 repeatedly into trace file (文檔 ID 19201866.8)),但是HP-UX上的11.2.0.4沒有解決需要升級到12.2版本,文檔中還說重啓實例也可以解決,但是不現實。
於是查報錯時的trace文件都是如下的內容
*** 2017-02-04 09:17:56.058 ERROR, tran=42.13.2709343, ose=0: ORA-24756: ......
看到tran=42.13.2709343,印象中是分佈式事務,於是查DBA_2PC_PENDINGS視圖
SYS@db1> COL LOCAL_TRAN_ID FORMAT A13 SYS@db1> COL GLOBAL_TRAN_ID FORMAT A90 SYS@db1> COL STATE FORMAT A10 SYS@db1> COL MIXED FORMAT A3 SYS@db1> COL HOST FORMAT A10 SYS@db1> COL COMMIT# FORMAT A15 SYS@db1> SET LINESIZE 240 SYS@db1> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE,FAIL_TIME,FORCE_TIME,RETRY_TIME, MIXED, HOST, COMMIT# 2 FROM DBA_2PC_PENDING 3 / LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE FAIL_TIME FORCE_TIME RETRY_TIME MIX HOST COMMIT# ------------- ------------------------------------------------------------------------------------------ ---------- ----------------- ----------------- ----------------- --- ---------- --------------- 42.13.2709343 1096044365.31302E3235352E3233322E32332E746D313438363137313036383638333230333633 collecting 20170204 09:17:55 20170206 01:12:41 no bosbpm4s 764631398601
看到LOCAL_TRAN_ID和trace文件中的tran一致,剛開始的設想是正確的。其中FAIL_TIME是20170204 09:17:55對應到後臺的alert日誌中,看到這個時間前後的報錯信息
Sat Feb 04 09:17:50 2017 Error 22 trapped in 2PC on transaction 42.13.2709343. Cleaning up. Error stack returned to user: ORA-02050: transaction 42.13.2709343 rolled back, some remote DBs may be in-doubt ORA-00022: invalid session ID; access denied ORA-02063: preceding line from LINK_DB2 Sat Feb 04 09:17:56 2017 DISTRIB TRAN 41544f4d.31302E3235352E3233322E32332E746D313438363137313036383638333230333633 is local tran 42.13.2709343 (hex=2a.0d.29575f) insert pending collecting tran, scn=764631398601 (hex=b2.079538c9) Sat Feb 04 09:17:56 2017 Errors in file /oracle11g/app/oracle/diag/rdbms/db1/db1/trace/db1_reco_23402.trc: ORA-24756:
這種分佈式事務故障可能會鎖定數據導致其他事務報ORA-01591報錯或者一直佔用UNDO段不能被別的事務重用。需要手工處理這種故障。
SYS@db1> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('42.13.2709343'); PL/SQL procedure successfully completed. SYS@db1> commit; Commit complete. SYS@db1> SELECT LOCAL_TRAN_ID, GLOBAL_TRAN_ID, STATE,FAIL_TIME,FORCE_TIME,RETRY_TIME, MIXED, HOST, COMMIT# 2 FROM DBA_2PC_PENDING 3 / no rows selected
清理完畢。
管理分佈式事務的官方文檔:http://docs.oracle.com/cd/E11882_01/server.112/e25494/ds_txnman.htm#ADMIN12252
MOS文檔:How to Purge a Distributed Transaction from a Database (文檔 ID 159377.1) ORA-30019 When Executing Dbms_transaction.Purge_lost_db_entry (文檔 ID 290405.1)