記MySql-8.0的MY-011495錯誤

一:報錯信息:

[Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 172.29.12.79:24803 has become unreachable.'

[Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 172.29.12.80:24805 has become unreachable.'

Plugin group_replication reported: 'This server is not able to reach a majority of members in the group. This server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to force a new group membership.'

二:報錯編碼對照信息

Error number: MY-011495; Symbol: ER_GRP_RPL_SRV_BLOCKED; SQLSTATE: HY000

Message: This server is not able to reach a majority of members in the group. This server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to force a new group membership.

三:拋錯位置:

void Plugin_gcs_events_handler::on_suspicions(

    const std::vector<Gcs_member_identifier> &members,

    const std::vector<Gcs_member_identifier> &unreachable)

/*組內成員個數減去不可達成員個數小於等於成員個數取模後結果*/

if ((members.size() - unreachable.size()) <= (members.size() / 2)) {

    /*組內多數成員不可達*/

    if (!group_partition_handler->get_timeout_on_unreachable())

      LogPluginErr(ERROR_LEVEL, ER_GRP_RPL_SRV_BLOCKED); /*拋出錯誤*/

    else

      LogPluginErr(ERROR_LEVEL, ER_GRP_RPL_SRV_BLOCKED_FOR_SECS,

                  group_partition_handler->get_timeout_on_unreachable());

    if (!group_partition_handler->is_partition_handler_running() &&

        !group_partition_handler->is_partition_handling_terminated())

      group_partition_handler->launch_partition_handler_thread();

    // flag as having lost quorum

    m_notification_ctx.set_quorum_lost();

  }

/*get_timeout_on_unreachable函數定義*/

/**

    @return the configured timeout

      @retval 0  The partition thread wont run or timeout.

      @retval >0 After this seconds the plugin will move to ERROR in a minority

  */

  ulong get_timeout_on_unreachable();

四:結論:

當mgr集羣內多數成員不可達,造成腦裂,並拋出異常,可以通過group_replication_force_members恢復

五:解決方法:

1.查看當前集羣狀態:

SELECT MEMBER_ID,MEMBER_STATE FROM performance_schema.replication_group_members;

2.查看當前節點地址

SELECT @@group_replication_local_address;

3:設置group_replication_force_members爲活躍的ip,例:只存活單節點

set global group_replication_force_members = @@group_replication_local_address;

4:集羣恢復後清空group_replication_force_members設置

set global group_replication_force_members = ‘’;

六:如何保證集羣多數節點非異常退出:

1:設置節點延遲退出時間,最長1小時:

group_replication_unreachable_majority_timeout

2:設置節點嘗試重新連接次數

group_replication_recovery_retry_count

3:設置節點嘗試重新連接間的間隔時間

group_replication_recovery_reconnect_interval:連接嘗試的睡眠間隔

示例

SET GLOBAL group_replication_unreachable_majority_timeout=600;

SET GLOBAL group_replication_recovery_retry_count= 5;

SET GLOBAL group_replication_recovery_reconnect_interval= 120;

七:如何保證集羣非大事務導致異常退出:

1.設置組複製壓縮,當傳輸的大小超過設置的大小後開啓組複製壓縮

STOP GROUP_REPLICATION;

SET GLOBAL group_replication_compression_threshold= 2097152;

START GROUP_REPLICATION;

2.設置組複製分段傳輸,當事務超過設置的大小後開啓分段傳輸

STOP GROUP_REPLICATION;

SET GLOBAL group_replication_communication_max_message_size= 5242880;

START GROUP_REPLICATION;

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章