NBU MediaServer 備份失敗狀態碼2

NBUversion:7.5

MediaServer:WindowsServer 2008R2

備份內容:SQLServer 數據

帶庫: IBM3584

在activity monitor中顯示如下

Info nbjm(pid=7004) started backup (backupid=xxxx_1379096131) job for client xxxx, policy centralDWH, schedule full on storage unit xxxx-hcart2-robot-tld-0

9/14/2013 2:15:33 AM - started process bpbrm (14008)

9/14/2013 2:15:34 AM - connecting

9/14/2013 2:15:34 AM - connected; connect time: 00:00:00

9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2)

9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003.

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed.

9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds.

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - Results of executing <C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch>:

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) <0> operations succeeded. <1> operations failed.

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - The following object(s) were not backed up successfully.

9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - CentralDWH

同時間SQLserver log

Date

Source

Severity

Message

09/14/2013 02:20:15

Backup

Unknown

BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages.

09/14/2013 02:20:15

Backup

Unknown

Error: 3041
Severity: 16
State: 1.

09/14/2013 02:04:57

Backup

Unknown

BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages.

09/14/2013 02:04:57

Backup

Unknown

Error: 3041
Severity: 16
State: 1.


問題分析:

首先日誌內容中

Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2)

Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up

說明bch腳本運行失敗,並沒有找到數據庫中需要備份的文件

然後這部分

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003.

9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed.

9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions.

9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:\Program Files\Veritas\NetBackup\DbExt\MsSql\centralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds.

說明nbu連接vdi超時,一般vdi默認是300秒,因爲沒有請求到數據庫的文件,所以腳本300秒後超時,vdi報錯,與此同時在windows server日誌中有一條error也記錄這個信息:

SQLVDI: Loc=SignalAbort. Desc=Client initiates abort

既然腳本沒執行就檢查了一下bch腳本,並沒有發現什麼問題,然後手動重新運行了一下這個policy,NBU又報錯了,不過這次不是腳本問題

INF - Created VDI object for SQL Server instance <xxxx>. Connection timeout is <300> seconds.
ERR - Error in GetConfiguration: 0x80770003.

在創建vdi後,等了300秒,又出現了Error in GetConfiguration 0x80770003,看來是創建vdi object出了問題,應該是nbu client調用SQLVDI.DLL來創建。

接下來看看dbclient log,這個日誌必須在nerbackup\log下新建一個dbclient文件夾纔會有:

<2> logconnections: BPRD CONNECT FROM media-ip.62961 TO master-ip.1556 fd = 1268

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_34284_37776_1>, SQL userid <sa> handle <0x0080d1b0>.

<4> CDBbackrec::InitDeviceSet(): INF - Created VDI object for SQL Server instance <instance>. Connection timeout is <300> seconds.------可以看到這裏創建vdi了

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.62962 TO master-ip.1556 fd = 1396

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.62963 TO master-ip.1556 fd = 952

<4> CGlobalInformation::VCSVirtualNameList: INF - Veritas Cluster Server is not installed.---這裏顯示沒有安裝veritas集羣

<1> CGlobalInformation::VCSVirtualNameList: CONTINUATION: - The system cannot find the path specified. ------找不到路徑

<4> getServerName: Read server name from nb_master_config: xxxxx

<4> CDBIniParms::CDBIniParms: INF - NT User is Administrator

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065acf0>.----sa0x0065acf0 登錄 SQLserver

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065c260>.----sa0x0065c260 登錄 SQLserver

<4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxx\instance> has been made using Trusted security with DSN <NBMSSQL_temp_23736_9600_1> using standard userid <sa>.

<4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c260>---sa0x0065c260 退出

<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>, SQL userid <sa> handle <0x0065c690>. 又一個sa登錄

<4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c690> 緊接着退出

<4> SQLEnumerator: INF - Enumerated SQL hosts: SERVER:Server={BJDSQLCLUSTER\instance};UID:Login ID=?;PWD:Password=?;Trusted_Connection:Use Integrated Security=?;*APP:AppName=?;*WSID:WorkStation ID=?

01:17:34.156 [23736.9600] <4> SQLEnumerator: INF - Could not enumerate Local SQL host/instance using SQLBrowseConnectW ---無法使用SQLBrowseConnect枚舉出sql本地主機和實例,這個SQLBrowseConnect用來發現和枚舉連接數據庫所需要值(主機名實例名等)

<4> CGlobalInformation::SQLEnumerator: INF - Hosts and instances retrieved from host list string

<4> CGlobalInformation::SQLEnumerator: INF - host: mediaserver

<4> CGlobalInformation::SQLEnumerator: INF - instance: xxxx

<4> CGlobalInformation::SQLEnumerator: INF - host: BJDSQLCLUSTER

<4> CGlobalInformation::SQLEnumerator: INF - instance: xxxxx

<4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxx\instance> has been made using Trusted security with DSN <NBMSSQL_23736_9600_2> using standard userid <sa>.----從host list中發現了主機名和實例,併成功連接,至此說明nbu client 連接到了數據庫實例,接下來看看爲什麼沒有備份成功

-------------------------------------------------------分割線--------------------------------------------

<4> StartupProcess: INF - Starting: <C:\Program Files\Veritas\NetBackup\bin\admincmd\bppllist.exe -byclient mediaserver>

中間又是一堆登錄信息,併成功連接到數據庫,這裏省略

<4> getServerName: Read server name from nb_master_config: masterserver

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.62996 TO master-ip.1556 fd = 960 --media的bprd連接master

<16> writeToServer: ERR - send() to server on socket failed: 發送socket失敗

<16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket


<16> CDBbackrec::InitDeviceSet_Part2(): ERR - Error in GetConfiguration: 0x80770003.這裏報錯和activity monitor裏一樣了

01:22:09.551 <1> CDBbackrec::InitDeviceSet_Part2(): CONTINUATION: - The api was waiting and the timeout interval had elapsed.

<2> vnet_pbxConnect: pbxConnectEx Succeeded

<2> logconnections: BPRD CONNECT FROM media-ip.63001 TO master-ip.1556 fd = 1400

01:22:09.703 <4> KillAllThreads: INF - Killing group #0

01:22:09.704 [34284.33648] <4> KillAllThreads: INF - Killing group #0

01:22:09.704 <4> KillAllThreads: INF - Issuing SignalAbort to MS SQL Server VDI --windows中看到的消息

01:22:09.704 [34284.33416] <4> KillAllThreads: INF - Killing group #0

01:22:09.704 [34284.32560] <4> KillAllThreads: INF - Killing group #0


01:22:12.709 <2> vnet_pbxConnect: pbxConnectEx Succeeded

01:22:12.710 <2> logconnections: BPRD CONNECT FROM media-ip.63002 TO master-ip.1556 fd = 1276

01:22:14.546 <16> writeToServer: ERR - send() to server on socket failed:

<16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket

<16> CDBbackrec::FreeDeviceSet(): ERR - Error in VDS->Close: 0x80770004.

看來故障原因是bprd 無法將進程狀態寫入name socket,導致 mediaserver和masterserver通信失敗,從而導致vdi超時。

http://www.symantec.com/business/support/index?page=content&id=TECH182435

這裏說 7.1版本中如果dbc_RemoteWriteFile- RemoteWriteFile status = 0狀態爲0可以忽略,下個版本中會解決,但是我是7.5,似乎不是這個問題。

http://www.symantec.com/docs/TECH146444 這篇文章提到sqlserver 某個補丁更新了SQLVDI.DLL,導致備份失敗。也不是我的問題

http://www.symantec.com/connect/forums/having-problem-mssql-agent-backup這篇裏提到2個方法

1刪除進程dbbackex.exe,2增加Client Connect 時間即 Client Read Timeout,可以在bch腳本增加VDITIMEOUTSECONDS XXXX(關於這個參數查閱NetBackup for Microsoft SQL Server Administrator’s Guide)來設置nbu與VDI連接超時的時間。

注意:

Before running another backup, ensure the following log folders exist on media server:

bptm and bpbrm.

If backup still fails after increasing media server timeouts, please check a new set of logs:

dbclient on SQL client, bptm and bpbrm on media server.



解決方案

在腳本中加入了VDITIMEOUTSECONDS 1800後,手動備份成功


備註:

關於錯誤代碼0x80770003和0x80770004在http://www.sqlbackuprestore.com/vdierrors.htm裏有關於vdi的錯誤信息的詳細解釋

0x80770003 (-2139684861)

The api was waiting and the timeout interval had elapsed.

Similar to the above example, this can happen when the backup application has waited a set amount of time waiting for SQL Server to respond to its backup request, but did not receive any response.

0x80770004 (-2139684860)

An abort request is preventing anything except termination actions.

An example of this error is when the backup software has encountered a critical error, and has issued an abort request to the VDI.

一篇不錯的文檔:關於如何在SQLserver上對NBU排錯

http://www.symantec.com/business/support/index?page=content&id=TECH38369


後記

備份流程 nbu策略--nbu備份腳本--mediaserverVDI---mediaserverDBProcess

mediaserver調用本地腳本,通過vdi和sqlserver裏的一組備份進程通信,每個備份的數據庫對應3個進程,備份完成後進程應該銷燬,並通過vdi通知mediaserver,然後mediserver完成備份。

當sqlserver備份進程在N秒(N是腳本里的超時時間)內不能完成備份,不能通過vdi通知mediaserver,nbu認爲備份失敗。那麼第二次備份時,進程依然存在的話,備份仍會失敗。
造成備份很慢的情況可能是sqlserver服務器性能過低,導致進程運行緩慢。

思考

應該增加sqlserver的性能

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章