瞭解Oracle RAC Brain Split Resolution集羣腦裂協議

CSS工作原理

在理解腦裂(Brain Split)處理過程前,有必要介紹一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:

Oracle RAC CSS提供2種後臺服務包括羣組管理(Group Managment簡稱GM)和節點監控(Node Monitor簡稱NM),其中GM管理組(group)和鎖(lock)服務。在集羣中任意時刻總有一個節點會充當GM主控節點(master node)。集羣中的其他節點串行地將GM請求發送到主控節點(master node),而master node將集羣成員變更信息廣播給集羣中的其他節點。組成員關係(group membership)在每次發生集羣重置(cluster reconfiguration)時發生同步。每一個節點獨立地詮釋集羣成員變化信息。

而節點監控NM服務則負責通過skgxn(skgxn-libskgxn.a,提供節點監控的庫)與其他廠商的集羣軟件保持節點信息的一致性。此外NM還提供對我們熟知的網絡心跳(Network heartbeat)和磁盤心跳(Disk heartbeat)的維護以保證節點始終存活着。當集羣成員沒有正常Network heartbeat或Disk heartbeat時NM負責將成員踢出集羣,被踢出集羣的節點將發生節點重啓(reboot)。

NM服務通過OCR中的記錄(OCR中記錄了Interconnect的信息)來了解其所需要監聽和交互的端點,將心跳信息通過網絡發送到其他集羣成員。同時它也監控來自所有其他集羣成員的網絡心跳Network heartbeat,每一秒鐘都會發生這樣的網絡心跳,若某個節點的網絡心跳在misscount(by the way:10.2.0.1中Linux上默認misscount爲60s,其他平臺爲30s,若使用了第三方vendor clusterware則爲600s,但10.2.0.1中未引入disktimeout;10.2.0.4以後misscount爲60s,disktimeout爲200s;11.2以後misscount爲30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒數中都沒有被收到的話,該節點被認爲已經”死亡”了。NM還負責當其他節點加入或離開集羣時初始化集羣的重置(Initiates cluster reconfiguration)。

在解決腦裂的場景中,NM還會監控voting disk以瞭解其他的競爭子集羣(subclusters)。關於子集羣我們有必要介紹一下,試想我們的環境中存在大量的節點,以Oracle官方構建過的128個節點的環境爲我們的想象空間,當網絡故障發生時存在多種的可能性,一種可能性是全局的網絡失敗,即128個節點中每個節點都不能互相發生網絡心跳,此時會產生多達128個的信息”孤島”子集羣。另一種可能性是局部的網絡失敗,128個節點中被分成多個部分,每個部分中包含多於一個的節點,這些部分就可以被稱作子集羣(subclusters)。當出現網絡故障時子集羣內部的多個節點仍能互相通信傳輸投票信息(vote mesg),但子集羣或者孤島節點之間已經無法通過常規的Interconnect網絡交流了,這個時候NM Reconfiguration就需要用到voting disk投票磁盤。

Voting Disk

因爲NM要使用voting disk來解決因爲網絡故障造成的通信障礙,所以需要保證voting disk在任意時刻都可以被正常訪問。在正常狀態下,每個節點都會進行磁盤心跳活動,具體來說就是會到投票磁盤的某個塊上寫入disk心跳信息,這種活動每一秒鐘都會發生,同時CSS還會每秒讀取一種稱作”kill block”的”賜死塊”,當”kill block”的內容表示本節點被驅逐出集羣時,CSS會主動重啓節點。

爲了保證以上的磁盤心跳和讀取”kill block”的活動始終正常運作CSS要求保證至少(N/2+1)個投票磁盤要被節點正常訪問,這樣就保證了每2個節點間總是至少有一個投票磁盤是它們都可以正常訪問的,在正常情況下(注意是風平浪靜的正常情況)只要節點所能訪問的在線voting disk多於無法訪問的voting disk,該節點都能幸福地活下去,當無法訪問的voting disk多於正常的voting disk時,Cluster Communication Service進程將失敗並引起節點重啓。所以有一種說法認爲voting disk只要有2個足以保證冗餘度就可以了,沒有必要有3個或以上voting disk,這種說法是錯誤的。Oracle推薦集羣中至少要有3個voting disks。

Question:

有同學問那麼voting disk  必須是奇數個呢?

Answer:

實際上我們僅僅是推薦使用奇數個vote disk ,而非必須是奇數個。10gR2中vote disk的數目上限是32個。

Question

我們可以使用2或4個vote disk嗎?

Answer:

可以的。 但是2、4這樣的數目在“至少(N/2+1)個投票磁盤要被節點正常訪問”這一disk heartbeat的硬性算法下是不利的:

當我們使用2個vote disk 時,不能發生任意個vote disk的心跳失敗

當我們使用3個vote disk 時,不能發生大於1個的vote disk心跳失敗

當我們使用4個vote disk 時,不能發生大於1個的vote disk心跳失敗 ,這和3個時的容錯率是一樣,但是因爲我們有更多的vote disk,這會導致管理成本和引入的風險增長

當我們使用5個vote disk 時,不能發生大於2個的vote disk心跳失敗

當我們使用6個vote disk 時,仍然不能發生大於2個的vote disk心跳失敗, 同樣的因爲比5時多出一個 ,也會引入不合理的管理成本和風險

Question:

若節點間的網絡心跳正常,且節點所能正常心跳的vote disk 大於不能正常訪問的 ,如3個votedisk 時恰巧有1個vote disk 的disk heartbeat 超時,此時Brain split 會發生嗎?

Answer:

這種情況即不會觸發Brain Split,也不會引發節點驅逐協議(eviction protocol)。 當單個或小於(N/2+1)個的voting disk心跳失敗(disk heartbeat failure)時,這種心跳失敗可能是由於短期內節點訪問voting disk發生I/O error錯誤而引起的,此時css會立刻將這些失敗的voting disk標記爲OFFLINE。雖然有一定數量的voting disk OFFLINE了,但是我們仍有至少(N/2+1)個投票磁盤可用,這保證了eviction protocol不會被調用,所以沒有節點會被reboot重啓。緊接着node monitor模塊的Disk ping Monitor Thread(DPMT-clssnmDiskPMT)會重複嘗試訪問這些失敗的OFFLINE voting disk,若這些投票磁盤變得再次可I/O訪問且經過驗證其上的數據也沒有訛誤,那麼css會再次將此voting disk標記爲ONLINE;但是如果在45s( 這裏的45s是基於misscount和 內部算法獲得的) 內仍不能正常訪問相關的voting disk,那麼DMPT將在cssd.log中生成警告信息,如:

CSSD]2011-11-11 20:11:20.668 >

WARNING: clssnmDiskPMT: long disk latency >(45940 ms) to voting disk (0//dev/asm-votedisk1)

假設以上發生clssnmDiskPMT警告的RAC場景共有3個voting disk,現已有一個asm-votedisk1因爲I/O error或其他原因而被標記爲OFFLINE,若此時再有一個votedisk也出現了問題並disk heartbeat 失敗,那麼節點會因爲少於規定數目(2)的votedisk而引發eviction protocol,進而重啓reboot。

單個或小於(N/2+1)個的voting disk心跳失敗都僅僅生成警告(Warning),而非致命的錯誤。因爲仍有絕大數量的vote disk可被訪問,因此生成的警告都是非致命的,eviction protocol將不會被觸發。

當實際的NM Reconfiguration集羣重置情況發生時所有的active節點和正在加入集羣的節點都會參與到reconfig中,那些沒有應答(ack)的節點都將不再被歸入新的集羣關係中。實際上reconfig重置包括多個階段:

1.初始化階段 — reconfig manager(由集羣成員號最低的節點擔任)向其他節點發送啓動reconfig的信號

2.投票階段 — 節點向reconfig manager發送該節點所瞭解的成員關係

3.腦裂檢查階段 — reconfig manager檢查是否腦裂

4.驅逐階段 — reconfig manager驅逐非成員節點

5.更新階段 — reconfig manager向成員節點發送權威成員關係信息

在腦裂檢查階段Reconfig Manager會找出那些沒有Network Heartbeat而有Disk Heartbeat的節點,並通過Network Heartbeat(如果可能的話)和Disk Heartbeat的信息來計算所有競爭子集羣(subcluster)內的節點數目,並依據以下2種因素決定哪個子集羣應當存活下去:

擁有最多節點數目的子集羣(Sub-cluster with largest number of Nodes)

若子集羣內數目相等則爲擁有最低節點號的子集羣(Sub-cluster with lowest node number),舉例來說在一個2節點的RAC環境中總是1號節點會獲勝。

採用Stonith algorithm 的IO fencing(remote power reset)

STONITH算法是一種常用的I/O Fencing algorithm,是RAC中必要的遠程關閉節點的接口。其想法十分簡單,當某個節點上運行的軟件希望確保本集羣內的其他節點不能使用某種資源時,拔出其他節點的插座即可。這是一種簡單、可靠且有些殘酷的算法。Stonith 的優勢是其沒有特定的硬件需求,也不會限制集羣的擴展性。

Oracle Clusterware的Process Monitor模塊負責實現IO fencing,保證不會因節點/實例的不協調工作而產生corruption。Process Monitor的工作具體由hangcheck timer或者oprocd 完成, 在Linux平臺上10.2.0.4 之前都沒有oprocd的存在(其他Unix平臺在10.2.0.1就有了),在安裝RAC之前需要額外安裝hangcheck timer軟件以保證IO fencing, 到10.2.0.4 時Linux上也有了oprocd,具體見<Know about RAC Clusterware Process OPROCD> 一文。 這些負責IO fencing的進程一般都會被鎖定在內存中(locked in memory)、實時運行(Real time)、休眠固定的時間(Sleep a fixed time)、以root身份運行;若進程喚醒後發現時間已經太晚,那麼它會強制reboot;若這些進程發生失敗,則也會重啓,所以在RAC環境中oprocd是一個很重要的進程,不要失去手動去kill這個進程。

在完成腦裂檢查後進入驅逐階段,被驅逐節點會收到發送給它們的驅逐信息(如果網絡可用的話),若無法發送信息則會通過寫出驅逐通知到voting disk上的”kill block”來達到驅逐通知的目的。同時還會等待被驅逐節點表示其已收到驅逐通知,這種表示可能是通過網絡通信的方式也可能是投票磁盤上的狀態信息。

可以看到Oracle CSS中Brain Split Check時會儘可能地保證最大的一個子集羣存活下來以保證RAC系統具有最高的可用性。

實際案例日誌

1號節點網絡失敗,2,3號節點形成子集羣;2,3節點通過voting disk向1號節點發起驅逐:

以下爲1號節點的ocssd.log日誌:

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds

對2,3號節點發起misscount計時

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds

[    CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds

[    CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds

[    CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds

[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds

[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds

[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds

[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds

[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds

[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds

[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds

[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds

[    CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds

[    CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds

2號節點的ocssd.log日誌:

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:   clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds

[    CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds

[    CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds

[    CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds

[    CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds

[    CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds

[    CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds

[    CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds

[    CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds

[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]

clssnmHandleSyn應答3號節點發送的同步信息

[    CSSD]2011-04-23 17:12:35.399 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0e]

發生Node Monitoring SUSPEND事件

[    CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(10)

通過clssnmSendVoteInfo向3號節點發送投票信息Vote mesg

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil))

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

確認1號節點離開了集羣cluster

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:12:35.415 [117574544] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed

[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)

[    CSSD]2011-04-23 17:12:35.416 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]

[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2

[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10

[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa

[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

以下爲3號節點的ocssd.log:

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:   clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30780

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds

[    CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds

[    CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds

[    CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds

[    CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds

[    CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds

[    CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds

[    CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 10

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE

[    CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(2)

[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]

[    CSSD]2011-04-23 17:13:18.330 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0e]

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(60010) state(5)

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(2)

[    CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)

以上完成了2-3節點間的Vote mesg通信,這些信息包含Node identifier,GM peer to peer listening endpoint以及

View of cluster membership。

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

開始檢測voting disk上的信息

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2

發現其他子集羣,包含1號節點且1號節點爲該子集羣的leader,爲最小子集羣;3號與2號節點組成最大子集羣,2號節點爲leader節點

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmEvict: Start

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10, 

impendingrcfg 1, stateflags 0x40d 

發起對1號節點的驅逐

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmSendShutdown: req to node 1, kill time 443294

[    CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil))

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 1, vrh1, undead 1

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150)

clssnmCheckKillStatus檢查1號節點是否down了

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(10)

[    CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(2)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 10 complete!

[    CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)

[    CSSD]2011-04-23 17:13:18.350 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]

[    CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)

[    CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE:   clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9)

[    CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

.........................省略若干行.........................

[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)

[    CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10

[    CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa

[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10

[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)

[    CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

更新階段   

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 3, master node number 2

[    CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

另一場景爲1號節點未加入集羣,2號節點的網絡失敗,因2號節點的member number較小故其通過voting disk向3號節點發起驅逐

以下爲2號節點的ocssd.log日誌

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds

[    CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds

[    CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds

[    CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds

[    CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds

[    CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds

[    CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds

[    CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds

[    CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds

[    CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:42:32.021 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0c]

[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)

[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 has smaller cluster size 1; my cluster size 1 with leader 2

檢查voting disk後發現子集羣3爲最小"子集羣"(3號節點的node number較2號大);2號節點爲最大子集羣

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d

[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084

發起對3號節點的驅逐和shutdown request

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil))

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 3, vrh3, undead 1

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210)

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(13)

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new)

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmDeactivateNode: node 3 (vrh3) left cluster

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: SYNC(13) from node(2) completed

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 13 complete!

[    CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (13)

[    CSSD]2011-04-23 17:42:32.024 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][04]

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3)

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3

................省略若干行..............

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3)

[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 13

[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf

[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerListener: connects done (1/1)

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(13), with status(1)

以下爲3號節點的ocssd.log日誌:

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)

[    CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds

[    CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds

[    CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds

[    CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds

[    CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds

[    CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds

[    CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds

[    CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)

[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms

[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]

[    CSSD]2011-04-23 17:43:15.912 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0c]

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE

[    CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(13)

[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1)

讀取voting disk後發現kill block,爲避免split brain,自我aborting!

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   ###################################

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   clssscExit: CSSD aborting

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   ###################################

[    CSSD]--- DUMP GROCK STATE DB ---

[    CSSD]----------

[    CSSD]  type 2, Id 4, Name = (crs_version)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 2

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]   . . . . .

[    CSSD]     memberNo =1, seq 12

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 3, Id 11, Name = (_ORA_CRS_FAILOVER)

[    CSSD]  flags: 0x0

[    CSSD]  grant: count=1, type 3, wait 1

[    CSSD]  Member Count =1, master -3

[    CSSD]   . . . . .

[    CSSD]     memberNo =0, seq 0

[    CSSD]     flags = 0x12, granted 1

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 2, Name = (EVMDMAIN)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 2

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 1

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 508

[    CSSD]     publicDataSize = 504

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 508

[    CSSD]     publicDataSize = 504

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 5, Name = (CRSDMAIN)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =1, master 3

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 128

[    CSSD]     publicDataSize = 128

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1)

[    CSSD]  flags: 0x0

[    CSSD]  grant: count=1, type 3, wait 1

[    CSSD]  Member Count =1, master -3

[    CSSD]   . . . . .

[    CSSD]     memberNo =0, seq 0

[    CSSD]     flags = 0x12, granted 1

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3)

[    CSSD]  flags: 0x0

[    CSSD]  grant: count=1, type 3, wait 1

[    CSSD]  Member Count =1, master -3

[    CSSD]   . . . . .

[    CSSD]     memberNo =0, seq 0

[    CSSD]     flags = 0x12, granted 1

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 3, Name = (ocr_crs)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 3

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x0, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 32

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 12

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 32

[    CSSD]----------

[    CSSD]----------

[    CSSD]  type 2, Id 1, Name = (#CSS_CLSSOMON)

[    CSSD]  flags: 0x1000

[    CSSD]  grant: count=0, type 0, wait 0

[    CSSD]  Member Count =2, master 2

[    CSSD]   . . . . .

[    CSSD]     memberNo =2, seq 1

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 2, nodeBirth 2

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]   . . . . .

[    CSSD]     memberNo =3, seq 2

[    CSSD]     flags = 0x1000, granted 0

[    CSSD]     refCnt = 1

[    CSSD]     nodeNum = 3, nodeBirth 3

[    CSSD]     privateDataSize = 0

[    CSSD]     publicDataSize = 0

[    CSSD]----------

[    CSSD]--- END OF GROCK STATE DUMP ---

[    CSSD]------- Begin Dump -------

補充官方文檔

The RM (Reconfig Manager) sends a sync message to all participating nodes. Participating nodes respond with a sync acknowledgement. After this the vote phase begins and the master sends a vote message to all participating nodes. Participating nodes repond with a vote info message containing their node identifier and GM peer to peer listening endpoint. In the split-check phase, the RM uses the voting disk to verify there is no split-brain. It finds nodes heartbeating to disk that are not connected via the network. If it finds these, it will determine which nodes are talking to which and the largest subcluster survives. For example, if we have a 5 node cluster and all of the nodes are heartbeating to the voting disk but only a group of 3 can communicate via the network and a group of 2 can communication via the network, this means we have 2 subclusters. The largest subcluster (3) would survive while the other subcluster (2) would not. After this the evict phase would evict nodes previously in the cluster but not considered members in this incarnation. In this case we would send a message to evicted nodes (if possible) and write eviction notice to a ‘kill’ block in the voting file. We would wait for the node to indicate it got the eviction notice (wait for seconds). The wait is terminated by a message or status on the voting file indicating that the node got the eviction notice. In the update phase the master sends an update message containing the definitive cluster membership and node information for all particpating nodes. The participating nodes send update acknowledgements. All members queue the reconfiguration event ot their GM.

As far as voting disks are concerned, a node must be able to access strictly more than half of the voting disks at any time. So if you want to be able to tolerate a failure of n voting disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can configure up to 32 voting disks, providing protection against 15 simultaneous disk failures.

Oracle recommends that customers use 3 or more voting disks in Oracle RAC 10g Release 2. Note: For best availability, the 3 voting files should be physically separate disks. It is recommended to use an odd number as 4 disks will not be any more highly available than 3 disks, 1/2 of 3 is 1.5…rounded to 2, 1/2 of 4 is 2, once we lose 2 disks, our cluster will fail with both 4 voting disks or 3 voting disks.

When interconnect breaks – keeps the largest cluster possible up, other nodes will be evicted, in 2 node cluster lowest number node remains. 

Node eviction: pick a cluster node as victim to reboot.Always keep the largest cluster possible up, evicted other nodes two nodes: keep the lowest number node up and evict other


參考至:http://www.askmaclean.com/archives/oracle-rac-brain-split-resolution.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章