NBUversion:7.5
MediaServer:WindowsServer 2008R2
備份內容:SQLServer 數據
帶庫: IBM3584
在activity monitor中顯示如下
Info nbjm(pid=7004) started backup (backupid=xxxx_1379096131) job for client xxxx, policy centralDWH, schedule full on storage unit xxxx-hcart2-robot-tld-0 9/14/2013 2:15:33 AM - started process bpbrm (14008) 9/14/2013 2:15:34 AM - connecting 9/14/2013 2:15:34 AM - connected; connect time: 00:00:00 9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2) 9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up 9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003. 9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed. 9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004. 9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions. 9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds. 9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - Results of executing <C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch>: 9/14/2013 2:20:49 AM - Info dbclient(pid=18520) <0> operations succeeded. <1> operations failed. 9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - The following object(s) were not backed up successfully. 9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - CentralDWH |
同時間SQLserver log
Date | Source | Severity | Message |
09/14/2013 02:20:15 | Backup | Unknown | BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages. |
09/14/2013 02:20:15 | Backup | Unknown | Error: 3041 |
09/14/2013 02:04:57 | Backup | Unknown | BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages. |
09/14/2013 02:04:57 | Backup | Unknown | Error: 3041 |
問題分析:
首先日誌內容中
Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2) Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up |
說明bch腳本運行失敗,並沒有找到數據庫中需要備份的文件
然後這部分
9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003. 9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed. 9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004. 9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions. 9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds. |
說明nbu連接vdi超時,一般vdi默認是300秒,因爲沒有請求到數據庫的文件,所以腳本300秒後超時,vdi報錯,與此同時在windows server日誌中有一條error也記錄這個信息:
SQLVDI: Loc=SignalAbort. Desc=Client initiates abort |
既然腳本沒執行就檢查了一下bch腳本,並沒有發現什麼問題,然後手動重新運行了一下這個policy,NBU又報錯了,不過這次不是腳本問題
INF - Created VDI object for SQL Server instance <xxxx>. Connection timeout is <300> seconds. ERR - Error in GetConfiguration: 0x80770003. |
在創建vdi後,等了300秒,又出現了Error in GetConfiguration 0x80770003,看來是創建vdi object出了問題,應該是nbu client調用SQLVDI.DLL來創建。
接下來看看dbclient log,這個日誌必須在nerbackup\log下新建一個dbclient文件夾纔會有:
<2> logconnections: BPRD CONNECT FROM media-ip.62961 TO master-ip.1556 fd = 1268 <4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_34284_37776_1>, SQL userid <sa> handle <0x0080d1b0>. <4> CDBbackrec::InitDeviceSet(): INF - Created VDI object for SQL Server instance <instance>. Connection timeout is <300> seconds.------可以看到這裏創建vdi了 <2> vnet_pbxConnect: pbxConnectEx Succeeded <2> logconnections: BPRD CONNECT FROM media-ip.62962 TO master-ip.1556 fd = 1396 <2> vnet_pbxConnect: pbxConnectEx Succeeded <2> logconnections: BPRD CONNECT FROM media-ip.62963 TO master-ip.1556 fd = 952 <4> CGlobalInformation::VCSVirtualNameList: INF - Veritas Cluster Server is not installed.---這裏顯示沒有安裝veritas集羣 <1> CGlobalInformation::VCSVirtualNameList: CONTINUATION: - The system cannot find the path specified. ------找不到路徑 <4> getServerName: Read server name from nb_master_config: xxxxx <4> CDBIniParms::CDBIniParms: INF - NT User is Administrator <4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065acf0>.----sa0x0065acf0 登錄 SQLserver <4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065c260>.----sa0x0065c260 登錄 SQLserver <4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxx\instance> has been made using Trusted security with DSN <NBMSSQL_temp_23736_9600_1> using standard userid <sa>. <4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c260>---sa0x0065c260 退出 <4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065c690>. 又一個sa登錄 <4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c690> 緊接着退出 <4> SQLEnumerator: INF - Enumerated SQL hosts: SERVER:Server={BJDSQLCLUSTER\instance};UID:Login ID=?;PWD:Password=?;Trusted_Connection:Use Integrated Security=?;*APP:AppName=?;*WSID:WorkStation ID=? 01:17:34.156 [23736.9600] <4> SQLEnumerator: INF - Could not enumerate Local SQL host/instance using SQLBrowseConnectW ---無法使用SQLBrowseConnect枚舉出sql本地主機和實例,這個SQLBrowseConnect用來發現和枚舉連接數據庫所需要值(主機名實例名等) <4> CGlobalInformation::SQLEnumerator: INF - Hosts and instances retrieved from host list string <4> CGlobalInformation::SQLEnumerator: INF - host: mediaserver <4> CGlobalInformation::SQLEnumerator: INF - instance: xxxx <4> CGlobalInformation::SQLEnumerator: INF - host: BJDSQLCLUSTER <4> CGlobalInformation::SQLEnumerator: INF - instance: xxxxx <4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxx\instance> has been made using Trusted security with DSN <NBMSSQL_23736_9600_2> using standard userid <sa>.----從host list中發現了主機名和實例,併成功連接,至此說明nbu client 連接到了數據庫實例,接下來看看爲什麼沒有備份成功 -------------------------------------------------------分割線-------------------------------------------- <4> StartupProcess: INF - Starting: <C:\Program Files\Veritas\NetBackup\bin\admincmd\bppllist.exe -byclient mediaserver> 中間又是一堆登錄信息,併成功連接到數據庫,這裏省略 <4> getServerName: Read server name from nb_master_config: masterserver <2> vnet_pbxConnect: pbxConnectEx Succeeded <2> logconnections: BPRD CONNECT FROM media-ip.62996 TO master-ip.1556 fd = 960 --media的bprd連接master <16> writeToServer: ERR - send() to server on socket failed: 發送socket失敗 <16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket <16> CDBbackrec::InitDeviceSet_Part2(): ERR - Error in GetConfiguration: 0x80770003.這裏報錯和activity monitor裏一樣了 01:22:09.551 <1> CDBbackrec::InitDeviceSet_Part2(): CONTINUATION: - The api was waiting and the timeout interval had elapsed. <2> vnet_pbxConnect: pbxConnectEx Succeeded <2> logconnections: BPRD CONNECT FROM media-ip.63001 TO master-ip.1556 fd = 1400 01:22:09.703 <4> KillAllThreads: INF - Killing group #0 01:22:09.704 [34284.33648] <4> KillAllThreads: INF - Killing group #0 01:22:09.704 <4> KillAllThreads: INF - Issuing SignalAbort to MS SQL Server VDI --windows中看到的消息 01:22:09.704 [34284.33416] <4> KillAllThreads: INF - Killing group #0 01:22:09.704 [34284.32560] <4> KillAllThreads: INF - Killing group #0 01:22:12.709 <2> vnet_pbxConnect: pbxConnectEx Succeeded 01:22:12.710 <2> logconnections: BPRD CONNECT FROM media-ip.63002 TO master-ip.1556 fd = 1276 01:22:14.546 <16> writeToServer: ERR - send() to server on socket failed: <16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket <16> CDBbackrec::FreeDeviceSet(): ERR - Error in VDS->Close: 0x80770004. |
看來故障原因是bprd 無法將進程狀態寫入name socket,導致 mediaserver和masterserver通信失敗,從而導致vdi超時。
http://www.symantec.com/business/support/index?page=content&id=TECH182435
這裏說 7.1版本中如果dbc_RemoteWriteFile- RemoteWriteFile status = 0狀態爲0可以忽略,下個版本中會解決,但是我是7.5,似乎不是這個問題。
http://www.symantec.com/docs/TECH146444 這篇文章提到sqlserver 某個補丁更新了SQLVDI.DLL,導致備份失敗。也不是我的問題
http://www.symantec.com/connect/forums/having-problem-mssql-agent-backup這篇裏提到2個方法
1刪除進程dbbackex.exe,2增加Client Connect 時間即 Client Read Timeout,可以在bch腳本增加VDITIMEOUTSECONDS XXXX(關於這個參數查閱NetBackup for Microsoft SQL Server Administrator’s Guide)來設置nbu與VDI連接超時的時間。
注意:
Before running another backup, ensure the following log folders exist on media server: bptm and bpbrm. If backup still fails after increasing media server timeouts, please check a new set of logs: dbclient on SQL client, bptm and bpbrm on media server. |
解決方案
在腳本中加入了VDITIMEOUTSECONDS 1800後,手動備份成功
備註:
關於錯誤代碼0x80770003和0x80770004在http://www.sqlbackuprestore.com/vdierrors.htm裏有關於vdi的錯誤信息的詳細解釋
0x80770003 (-2139684861) | The api was waiting and the timeout interval had elapsed. Similar to the above example, this can happen when the backup application has waited a set amount of time waiting for SQL Server to respond to its backup request, but did not receive any response. |
0x80770004 (-2139684860) | An abort request is preventing anything except termination actions. An example of this error is when the backup software has encountered a critical error, and has issued an abort request to the VDI. |
http://www.symantec.com/business/support/index?page=content&id=TECH38369
後記
備份流程 nbu策略--nbu備份腳本--mediaserverVDI---mediaserverDBProcess
mediaserver調用本地腳本,通過vdi和sqlserver裏的一組備份進程通信,每個備份的數據庫對應3個進程,備份完成後進程應該銷燬,並通過vdi通知mediaserver,然後mediserver完成備份。
當sqlserver備份進程在N秒(N是腳本里的超時時間)內不能完成備份,不能通過vdi通知mediaserver,nbu認爲備份失敗。那麼第二次備份時,進程依然存在的話,備份仍會失敗。
造成備份很慢的情況可能是sqlserver服務器性能過低,導致進程運行緩慢。
思考
應該增加sqlserver的性能