第二部分關於Cassandra1.0.x節點間通訊《草稿》

第二部分翻譯了節點間的通信、羣集成員和錯誤的發現和修復

原文

About Internode Communications (Gossip)

Cassandra uses aprotocol called gossip to discover location and state information about theother nodes participating in a Cassandra cluster. Gossip is a peer-to-peercommunication protocol in which nodes periodically exchange state informationabout themselves and about other nodes they know about.

In Cassandra, thegossip process runs every second and exchanges state messages with up to threeother nodes in the cluster. The nodes exchange information about themselves andabout the other nodes that they have gossiped about, so all nodes quickly learnabout all other nodes in the cluster. A gossip message has a version associatedwith it, so that during a gossip exchange, older information is overwrittenwith the most current state for a particular node.

譯文

關於節點間通訊（八卦協議）《翻譯草稿》

Cassandra使用一種叫做“八卦-gossip”的協議去發現加入羣集的節點的信息和狀態，gossip是一個點對點（p-to-p）的協議，它支持羣集節點之間的信息和狀態交換，這些信息可以節點主動發出的或是其他節點發出而被動接受的。

只要一個羣集的節點數大於3個，那麼gossip每秒鐘都在運行交換節點間的信息。節點間交換的信息主要是關於它自己的和其他節點的，所以羣集中得每個節點都在很快的彼此互相學習。

每一條gossip信息都攜帶有版本信息，以便在gossip進行信息交換式，舊的信息會被當前節點的新的信息所覆蓋。

譯者注

中國有一種神祕的“技術”叫做“八卦”，據說八卦是四通八達的一種有科學的技術，而且八卦上的每個元素之間既然要連通在一起，就必然有其因果關係，Cassandra的gossip技術確實也和中國的八卦技術有的一拼，而且還拿出實際的應用出來了，並的確可行的方案。

全世界都有一種稱作爲“八卦新聞”的東西，所謂八卦新聞也是來自八卦的特點，也就是你傳我傳你，你傳他，很類似數據的1對N關係，N是未知數，可以是10，可以使10000，就如接受八卦新聞的人你估計不出來。但是八卦新聞確實傳得很快，因爲大家都感興趣。

原文

About Cluster Membership and Seed Nodes

When a node firststarts up, it looks at its configuration file to determine thename of the Cassandra cluster it belongsto and which node(s), calledseeds, to contact to obtain information about the other nodes in the cluster.These cluster contact points are configured in the cassandra.yamlconfiguration file for a node.

Toprevent partitions in gossip communications, all nodes in a cluster should have the same list ofseed nodes listed in their configuration file.This is most criticalthe first time a node starts up. By default, a node will rememberother nodes it has gossiped with between subsequent restarts.

Note

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster.Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.

Toknow what range of data it is responsible for, a node must also know its own token andthose of the other nodes in the cluster. When initializing a new cluster, youshould generate tokens for the entire cluster and assign an initial token toeach node before starting up. Each node will then gossip its token to theothers.See About DataPartitioning in Cassandra for more information aboutpartitioners and tokens.

譯文

關於羣集成員和種子節點

當一個節點第一次啓動時，它會通過配置文件確定它所在的羣集名稱以及羣集中得其他節點（叫做種子節點），並連接這樣的節點獲得羣集中其他節點的信息。這些羣集的接觸點被設置在每一個節點的cassandra.yaml文件裏面。

爲了防止部分區域通訊故障，羣集中每個節點的配置文件中都有相同的種子節點列表，重點是在節點啓動時客戶起到重要的作用。默認情況下，一個節點將會通過gossip協議記住其他節點的啓動已否的情況。

備註

種子節點的設置是那些在加入羣集時不需要自舉的節點，種子節點不能有單點故障，在羣集操作過程中也沒有其他特殊的目的。

羣集中得每個節點通過“令牌-token”瞭解自己的數據作用範圍和其他節點的素具作用範圍。當一個羣集初始化時，羣衆中得每個節點會被指定一個token，不管是在配置時手工自指定的還是羣集自動生成的，每個節點將告訴其他節點關於自己的token信息，這部分內容請參考“Cassandra關於分區器和令牌的數據分割”。

原文

About Failure Detection and Recovery

Failure detectionis a method for locally determining, from gossip state, if another node in thesystem is up or down.Failure detection information is also used byCassandra to avoid routing client requests to unreachable nodes wheneverpossible. (Cassandra can also avoid routing requests tonodes that are alive, but performing poorly, through the dynamicsnitch.)

Thegossip process tracks heartbeats from other nodes both directly (nodesgossiping directly to it) and indirectly (nodes heard about secondhand,thirdhand, and so on).Ratherthan have a fixed threshold for marking nodes without a heartbeat as down,Cassandra uses an accrual detection mechanism to calculate a per-node thresholdthat takes into account network conditions, workload, or other conditions thatmight affect perceived heartbeat rate. During gossip exchanges, every node maintains asliding window of inter-arrival times of gossip messages from other nodes inthe cluster. The value of phi is based on thedistribution of inter-arrival time values across all nodes in the cluster. InCassandra, configuring thephi_convict_thresholdproperty adjusts the sensitivity of the failure detector. The default value isfine for most situations, but DataStax recommends increasing it to 12 forAmazon EC2 due to the network congestion frequently experienced on thatplatform.

Node failures canresult from various causes such as hardware failures, network outages, and soon. Node outages are often transient but can last for extended intervals. Anode outage rarely signifies a permanent departure from the cluster, andtherefore does not automatically result in permanent removal of the node fromthe ring. Other nodes will still try to periodically initiate gossip contactwith failed nodes to see if they are back up. To permanently change a node’smembership in a cluster, administrators must explicitly add or remove nodesfrom a Cassandra cluster using the nodetoolutility.

When a node comesback online after an outage, it may have missed writes for the replica data itmaintains. Once the failure detector marks a node as down, missed writes arestored by other replicas ifhintedhandoff is enabled (for a period of time, anyways). However, itis possible that some writes were missed between the interval of a nodeactually going down and when it is detected as down. Or if a node is down forlonger thanmax_hint_window_in_ms(one hour by default), hints will no longer be saved. For that reason, it isbest practice to routinely runnodetoolrepair on all nodes to ensure they have consistent data, and toalso run repair after recovering a node that has been down for an extendedperiod.

譯文

關於錯誤的發現和修復

故障檢測可以通過羣集的gossip協議，在本機即可確認其他節點是否已經啓動或是關閉。故障檢測信息也可用於無路由的狀態下由客戶端直接向節點發出請求而發現節點不可達。（Cassandra也可以在一個運行的節點發出無路由的請求，但是效果不好，而通過動態告密者效果會更好。）

Gossip通過心跳追蹤每個節點的狀態，不管是直接的方式還是通過間接的方式（比如經過第二手或是第三手中轉的方式）。可以通過固定的方式獲得節點的心跳狀態，判斷節點進入網路環境，負載，或是其他可能影響節點心跳速率的條件。在gossip期間，每個節點滾動維護着從其他節點發送過來的信息。Phi的值是基於跨越所有節點的到達值。在Cassandra裏面通過設置phi_convict_threshold調整錯誤檢測的靈敏度，默認值是主動發現絕大部分錯誤，但是在網絡比較堵塞的情況下DataStax建議給亞馬遜的EC2得錯誤檢測值設置成12。

備註：錯誤可以來自任何情況，比如硬件，網絡等等。節點中斷往往是短暫的，但可以持續很長的時間。一個節點中斷了就標記着它暫時與羣集脫離了，因此它不會永久的從羣集羣組中脫離。其他節點仍然會嘗試定期主動通過gossip與失敗的節點接觸，看看他們是否有備份。想永久的讓一個節點脫離羣集，管理員必須明確的通過nodetool把節點從一個羣集中取出。

當一個節點修復之後並在此上線了，也許它少了一些已經寫入其他節點的數據，那麼它將從它故障點開始，從其他備份獲得數據（但是不管怎麼樣都會間隔一小段時間），然而還是有可能被發現斷開羣集時，它已經缺少了很多寫的操作了，或者一個節點離開羣集的時間超過max_hint_window_in_ms設置的值，超過部分的寫炒作將不會被記錄，基於這個理由最好的方式是經常在所有節點上運行nodetool維修，以確保他們有一致的數據，並同時運行一個已經長時間的節點恢復後的修復。

譯者注：本篇文章翻譯得很差勁，主要的問題是對Cassandra複雜的錯誤處理機制還需要進一步瞭解。也發現字句的表達是在很差經，希望有經驗的朋友可以幫忙看看。

名劍傳奇

發佈了57 篇原創文章 · 獲贊 9 · 訪問量 58萬+

私信關注

第二部分關於Cassandra1.0.x節點間通訊《草稿》

杭州的 IT 崩盤了麼？

開源高性能結構化日誌模塊NanoLog

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

android TextView 如何實現消息滾動

就“顫抖吧，理科生們，讓你們看看文科帝！ ”發表的評論

第二部分關於Cassandra1.0.x節點間通訊《草稿》

Android 狀態欄通知

第五部分安裝Cassandra1.0.x的單實例模式

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

第二部分 關於Cassandra1.0.x節點間通訊《草稿》

第二部分關於Cassandra1.0.x節點間通訊《草稿》