常見的文件丟失場景及恢復

常見的文件丟失場景及恢復
常見文件丟失包括三大類:
1)、控制文件丟失;
2)、數據文件丟失;
3)、重做日誌丟失;


一、控制文件的丟失:
A:控制文件部分丟失
解決方案:複製可用的control file或修改spfile/pfile重新啓動。

B:控制文件全部丟失
解決方案:
>恢復物理備份
>通過備份的腳本重建
Shutdown
Startup nomount
Create controlfile…..
Recover database
Alter database open
>如果沒有備份或trace腳本,嘗試手工編寫腳本重建


二、重做日誌丟失:
A:丟失非current redo log
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: ‘/add/test1/test1/redo01.log’
select a.group#,b.member,a.status from v$log a,v$logfile b where a.group#=b.group#;
GROUP# MEMBER STATUS
---------- ---------------------------------------- ----------------
1 /add/test1/test1/redo01.log INACTIVE
3 /add/test1/test1/redo03.log CURRENT
2 /add/test1/test1/redo02.log INACTIVE
解決方案:
Alter database clear logfile /xxx/xxx/redoxx.log;

B:丟失current redo log,但是數據庫是正常關閉的
select a.group#,b.member,a.status from v$log a,v$logfile b where a.group#=b.group#;
GROUP# MEMBER STATUS
---------- ---------------------------------------- ----------------
1 /add/test1/test1/redo01.log INACTIVE
3 /add/test1/test1/redo03.log CURRENT
2 /add/test1/test1/redo02.log INACTIVE
解決方案:
以resetlogs方式打開
SQL> recover database until cancel;
Media recovery complete.
SQL> alter database open resetlogs;
Database altered.

C:丟失current/active redo log,但是數據庫時非正常關閉的
嘗試fake recover後以resetlogs方式打開數據庫,出現報錯
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/add/test1/test1/system01.dbf’
SQL> select file#,status,checkpoint_change#,checkpoint_time,fuzzy from v$datafile_header order by 1;
FILE# STATUS CHECKPOINT_CHANGE# CHECKPOIN FUZZY
---------- ------- -------------------------- --------- ---
1 ONLINE 4565550461182 26-MAY-13 YES
2 ONLINE 4565550461182 26-MAY-13 YES
3 ONLINE 4565550461182 26-MAY-13 YES
4 ONLINE 4565550461182 26-MAY-13 YES
解決方案:
>使用備份進行基於時間點恢復
Restore old backup
SQL> startup mount
SQL> recover database until cancel using backup controlfile; SQL> alter database open resetlogs;
>沒有任何備份的情況下,無法正常OPEN,只能將數據庫在不一致情況下強制啓動,將用戶數據導出備份並重建數據庫


三、數據文件丟失:
A:非系統數據文件丟失
ORA-01157: cannot identify/lock data file 5 - see DBWR trace file
ORA-01110: data file 5: '/add/test1/test1/test1.dbf'
解決方案:
>使用備份數據對數據文件進行恢復;
>忽略這部分數據,offline drop該數據文件啓動數據庫(這樣話,在此數據文件的數據丟失)

B:系統數據文件丟失
>使用備份對數據文件進行恢復
>沒有備份的情況下,該數據庫將無法啓動(強制or正常),數據無法恢復,如果數據非常重要,可以嘗試使用DUL進行數據搶救


四、強制啓動&數據搶救
SCN+NO FUZZY ======>一致性啓動
SCN主要有四類:
1、    System checkpoint SCN 記錄在V$DATABASE:checkpoint_change#
2、    Datafile checkpoint SCN記錄在V$DATAFILE:checkpoint_change#
3、    Datafile start SCN 記錄在V$DATAFILE_HEADER:checkpoint_change#
4、    Datafile stop SCN 記錄在V$DATAFILE:last_change#

A:NO FUZZY
Media-Recoery-Fuzzy
當datafile上有block的SCN比datafile header中的SCN更前時,可以認爲該數據文件包含髒塊,處於fuzzy狀態,需要更多的recovery保持一致
SQL> shutdown abort
ORACLE instance shut down.
SQL> startup mount
Database mounted.
SQL> select file#,status,checkpoint_change#,
checkpoint_time,fuzzy from v$datafile_header order by 1;
FILE# STATUS CHECKPOINT_CHANGE# CHECKPOIN FUZ
---------- ------- ------------------ --------- ---
1 ONLINE 4565550830945 17-JUN-13 YES
2 ONLINE 4565550830945 17-JUN-13 YES
3 ONLINE 4565550830945 17-JUN-13 YES
4 ONLINE 4565550830945 17-JUN-13 YES
$ dbv file=system01.dbf blocksize=8192
........
Highest block SCN : 595433 (1063.595433) SCN_WRAP.SCN_BASE
SCN= (SCN_WRAP*4294967296)+SCN_BASE =>4565550831081

隱藏參數使用:
_allow_resetlogs_corruption
Database open 階段強制跳過一致性檢查,不再檢查該文件在數據庫關閉前時什麼狀態以及數據庫是如何關閉的。
ORA-01190: control file or data file %s is from before the last RESETLOGS
ORA-01194: file %s needs more recovery to be consistent
ORA-01113: file '%s' needs media recovery starting at log sequence # %s
ORA-01195: on-line backup of file %s needs more recovery to be consistent"
ORA-01196: file %s is inconsistent due to a failed media recovery session
ORA-01152: file '%s' was not restored from a sufficientluy old backup"

使用須知:
1、    客戶沒有備份
2、    可能丟失的數據非常重要和珍貴並且無法通過其他方式生成
3、    客戶已經準備好進行全庫導出並重建數據庫
4、    設置改隱含參數並不能保證數據庫100%能夠強制拉起來
5、    ORACLE不再對使用該隱含參數強制拉起的的數據庫提供support

_corrupted_rollback_segments
實例啓動階段阻止所有對指定回滾段的訪問,回滾段中的活動事務被認爲已經提交
使用方法:
>修改undo_management=manual
>添加_corrupted_rollback_segments=(_SYSSMU1$,…,…),可通過下面腳本獲取相關信息
Strings system01.dbf | grep _SYSSMU |cut –d $ -f 1 | sort –u
>註釋undo_tablespace和undo_retention

_allow_resetlogs_corruption後可能遇到報錯
1.    ORA-00600:[2662]
A data block SCN is ahead of the current SCN
>_minimum_giga_scn
>event ADJUST_SCN
>event 10015
2.ORA-00600:[2662]+ORA-00704
BOOTSTRAP錯誤,無法恢復
3.ORA-00600:[4137]/[4138]/[4139]
FORCE OPEN後常見smon訪問undo出現問題,因此一般使用
4.ORA-00600:[kdsgrp1]
掃描遇到數據壞塊
>event 10231
>dbms_repair.skip_corrupt_blocks

簡單案例:
1.數據庫
2. 查詢出所有回滾段
-bash-3.2$ strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort –
_SYSSMU10_1221199237
_SYSSMU10_1221203537
……
3. 修改pfile文件如下
#*.undo_tablespace='UNDOTBS1'
_allow_resetlogs_corruption = true
undo_management = MANUAL
_CORRUPTED_ROLLBACK_SEGMENTS = (_SYSSMU10_1221199237,…)
4.關閉數據,利用pfile重新啓動到mount階段,resetlogs強制打開數據庫
_allow_resetlogs_corruption時結合_corrupted_rollback_segments
>undo_management=manual
>_corrupted_rollback_segments=所有回滾段的信息
SQL> startup mount
Database mounted.
SQL> recover database until cancel;
ORA-00279: change 4565550830945 generated at 06/17/2013 00:02:46 needed for thread 1
ORA-00289: suggestion :
………..
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/add/test1/test1/system01.dbf'
ORA-01112: media recovery not started
SQL> alter database open resetlogs;
Database altered.

如果起不來,就要到用到下面的參數了
SCN 參數:_minimum_giga_scn (event ADJUST_SCN/10015)
ORA-00600:[2662]
A data block SCN is ahead of the currentSCN
>_minimum_giga_scn
>event ADJUST_SCN
>event 10015

簡單案例:
設置參數 _allow_resetlogs_corruption 和_CORRUPTED_ROLLBACK_SEGMENTS 後,嘗試resetlogs模式打開數據庫,報錯:
ORA-00600: internal error code, arguments: [2662], [1826], [1818451944], [1826], [1818507298], [322961417], [], []
ORA-00600: internal error code, arguments: [2663], [0], [637083365], [0], [637083437], [], [], [], [], [], [], []
ORA-600 [2662] [a] [b] [c] [d] [e]:
Arg [a] Current SCN WRAP:當前(控制文件)的SCN WRAP
Arg [b] Current SCN BASE:當前(控制文件)的SCN BASE
Arg [c] dependent SCN WRAP:目標SCN WRAP
Arg [d] dependent SCN BASE:目標SCN BASE

ORA-00600: internal error code, arguments: [2662], [1826], [1818451944], [1826], [1818507298], [322961417], [], [] 我們知道SCN= (SCN_WRAP * 4294967296)+SCN_BASE,所以
1.期望的SCN值爲1826. 1818507298=(1826*4294967296)+ 1818507298=7844428789794
2. 期望SCN轉換爲giga值 = 7844428789794/1024/1024/1024= 7305.XXXX 因此,需要設置_MINIMUM_GIGA_SCN=7306 稍大一點來調整當前SCN大於block的SCN
3.在pfile中增加參數_minimum_giga_scn=7306
4.重新啓動數據庫
Startup mount
Recover database
Alter database open
5.數據庫open成功後,必須刪除該參數後再次重啓動
Delete parameter _minimum_giga_scn from the init.ora file
Shutdown the database
Startup

使用ADJUST_SCN event的幾個場景
1.    出現ORA-00600[2662]報錯
2.    強制啓動後不斷報ORA-1555或者啓動時報ORA-604/ORA-1555
(如果啓動報ORA-704和ORA-1555則不能使用該event因爲報錯發生在bootstrap.如果SCN差距較小可以嘗試反覆啓動)
2.    使用_ALLOW_RESETLOGS_CORRUPTION強制啓動數據庫



實際案例:
數據庫版本:Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
其他信息:無備份,非歸檔,無法open起來,並且之前做個多次恢復未成功,纔有下面的的操作
1、利用select file#,status,checkpoint_change#,
checkpoint_time,fuzzy from v$datafile_header order by 1;
發現所有數據文件的fuzzy位是YES,異常關閉數據庫,啓動時需要做數據庫一致性驗證,嘗試alter database open;
發現需要undo文件要做恢復,此刻,嘗試recover datafile xxx,發現做了過後,沒有效果,還是要做驗證;
2、查看undo segment信息
[root@localhost orcl]# strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 |sort -u
              and substr(drs.segment_name,1,7) != '_SYSSMU'
D'              and substr(drs.segment_name,1,7) != ''_SYSSMU'' ' );
_SYSSMU10_3550978943
_SYSSMU10_3904554333
_SYSSMU11_286947212
_SYSSMU11_379893357
_SYSSMU12_3068564564
_SYSSMU12_3345414330
_SYSSMU13_1045611951
_SYSSMU13_2761193625
_SYSSMU1_3780397527
_SYSSMU14_1060866920
_SYSSMU14_2421411996
_SYSSMU15_1683924174
_SYSSMU15_2554699021
_SYSSMU16_2313212396
_SYSSMU16_2701506487
_SYSSMU17_1787446293
_SYSSMU17_2041439332
_SYSSMU1_783380902
_SYSSMU18_2800789714
_SYSSMU18_2983290590
_SYSSMU19_2323602401
_SYSSMU19_53723967
_SYSSMU20_2611377660
_SYSSMU20_3850939844
_SYSSMU21_158022190
_SYSSMU2_2232571081
_SYSSMU22_4293381698
_SYSSMU2_3138176977
_SYSSMU23_3502087459
_SYSSMU24_3911757283
_SYSSMU25_969882745
_SYSSMU26_4265453263
_SYSSMU27_2796764416
_SYSSMU28_2949705525
_SYSSMU29_916836042
_SYSSMU30_2951968219
_SYSSMU3_1645411166
_SYSSMU31_77080155
_SYSSMU3_2097677531
_SYSSMU32_2383226926
_SYSSMU33_2171873015
_SYSSMU34_4218399528
_SYSSMU35_1281591169
_SYSSMU36_4223850388
_SYSSMU37_3891560429
_SYSSMU38_830470978
_SYSSMU39_4269670886
_SYSSMU40_1548085334
_SYSSMU41_1028836633
_SYSSMU4_1152005954
_SYSSMU42_180022750
_SYSSMU43_319060321
_SYSSMU44_934106214
_SYSSMU45_2860974049
_SYSSMU46_3982067532
_SYSSMU47_11933987
_SYSSMU4_870421980
_SYSSMU48_724432902
_SYSSMU49_3198896008
_SYSSMU50_1790859891
_SYSSMU51_4188899104
_SYSSMU5_1527469038
_SYSSMU52_288285783
_SYSSMU5_2525172762
_SYSSMU53_311086950
_SYSSMU54_1597897898
_SYSSMU55_1028194913
_SYSSMU56_2625382688
_SYSSMU57_1912349309
_SYSSMU58_1635312664
_SYSSMU6_2443381498
_SYSSMU6_3753507049
_SYSSMU7_1260614213
_SYSSMU7_3286610060
_SYSSMU8_2012382730
_SYSSMU8_2806087761
_SYSSMU9_1424341975
_SYSSMU9_973944058
[root@localhost orcl]#

在pfile裏添加如下參數,並且註釋了undo_tablespace參數
_allow_resetlogs_corruption=true
undo_management=MANUAL
_CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU10_3550978943,_SYSSMU10_3904554333,_SYSSMU11_286947212,_SYSSMU11_379893357,_SYSSMU12_3068564564,_SYSSMU12_3345414330,_SYSSMU13_1045611951,_SYSSMU13_2761193625,_SYSSMU1_3780397527,_SYSSMU14_1060866920,_SYSSMU14_2421411996,_SYSSMU15_1683924174,_SYSSMU15_2554699021,_SYSSMU16_2313212396,_SYSSMU16_2701506487,_SYSSMU17_1787446293,_SYSSMU17_2041439332,_SYSSMU1_783380902,_SYSSMU18_2800789714,_SYSSMU18_2983290590,_SYSSMU19_2323602401,_SYSSMU19_53723967,_SYSSMU20_2611377660,_SYSSMU20_3850939844,_SYSSMU21_158022190,_SYSSMU2_2232571081,_SYSSMU22_4293381698,_SYSSMU2_3138176977,_SYSSMU23_3502087459,_SYSSMU24_3911757283,_SYSSMU25_969882745,_SYSSMU26_4265453263,_SYSSMU27_2796764416,_SYSSMU28_2949705525,_SYSSMU29_916836042,_SYSSMU30_2951968219,_SYSSMU3_1645411166,_SYSSMU31_77080155,_SYSSMU3_2097677531,_SYSSMU32_2383226926,_SYSSMU33_2171873015,_SYSSMU34_4218399528,_SYSSMU35_1281591169,_SYSSMU36_4223850388,_SYSSMU37_3891560429,_SYSSMU38_830470978,_SYSSMU39_4269670886,_SYSSMU40_1548085334,_SYSSMU41_1028836633,_SYSSMU4_1152005954,_SYSSMU42_180022750,_SYSSMU43_319060321,_SYSSMU44_934106214,_SYSSMU45_2860974049,_SYSSMU46_3982067532,_SYSSMU47_11933987,_SYSSMU4_870421980,_SYSSMU48_724432902,_SYSSMU49_3198896008,_SYSSMU50_1790859891,_SYSSMU51_4188899104,_SYSSMU5_1527469038,_SYSSMU52_288285783,_SYSSMU5_2525172762,_SYSSMU53_311086950,_SYSSMU54_1597897898,_SYSSMU55_1028194913,_SYSSMU56_2625382688,_SYSSMU57_1912349309,_SYSSMU58_1635312664,_SYSSMU6_2443381498,_SYSSMU6_3753507049,_SYSSMU7_1260614213,_SYSSMU7_3286610060,_SYSSMU8_2012382730,_SYSSMU8_2806087761,_SYSSMU9_1424341975,_SYSSMU9_973944058)

利用該pfile啓動數據庫到mount階段
ALTER DATABASE RECOVER UNTIL
 Media Recovery Start
 started logmerger process
Parallel Media Recovery started with 8 slaves
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel  ...
ALTER DATABASE RECOVER    CANCEL  
Wed Jul 30 09:25:30 2014
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_3896.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u02/oradata/orcl/system01.dbf'
Slave exiting with ORA-1547 exception
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_pr00_3896.trc:
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/u02/oradata/orcl/system01.dbf'
ORA-10879 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
ALTER DATABASE RECOVER CANCEL
Media Recovery Canceled
Completed: ALTER DATABASE RECOVER CANCEL
Wed Jul 30 09:26:27 2014
ALTER database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 637040346
Resetting resetlogs activation ID 1351186164 (0x508976f4)
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3154.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 1 of thread 1 is not current copy
ORA-00312: online log 1 thread 1: '/u02/oradata/orcl/redo01.log'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3154.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 2 of thread 1 is not current copy
ORA-00312: online log 2 thread 1: '/u02/oradata/orcl/redo02.log'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3154.trc:
ORA-00367: checksum error in log file header
ORA-00322: log 3 of thread 1 is not current copy
ORA-00312: online log 3 thread 1: '/u02/oradata/orcl/redo03.log'
Wed Jul 30 09:26:31 2014
Setting recovery target incarnation to 2
Wed Jul 30 09:26:31 2014
Assigning activation ID 1382066206 (0x5260a81e)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: /u02/oradata/orcl/redo01.log
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jul 30 09:26:31 2014
SMON: enabling cache recovery
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3154.trc  (incident=74556):
ORA-00600: internal error code, arguments: [2663], [0], [637083365], [0], [637083437], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl/incident/incdir_74556/orcl_ora_3154_i74556.trc
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3154.trc:
ORA-00600: internal error code, arguments: [2663], [0], [637083365], [0], [637083437], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_3154.trc:
ORA-00600: internal error code, arguments: [2663], [0], [637083365], [0], [637083437], [], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 3154): terminating the instance due to error 600
Instance terminated by USER, pid = 3154
ORA-1092 signalled during: ALTER database open resetlogs...
opiodr aborting process unknown ospid (3154) as a result of ORA-1092

發現數據庫起不來,那麼接下來就做一個跳SCN處理:
_minimum_giga_scn=1放到pfile
重新啓動,open數據庫,正常啓動
刪除此參數和重建臨時表空間,導出數據放到新庫裏;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章