GR節點故障The member contains transactions not present in the group

背景

1個共用的測試環境:3節點、單主的的group replication。一天啓動節點3(node3)時,報錯:[ERROR] Plugin group_replication reported: ‘The member contains transactions not present in the group. The member will now exit the group.’
本文是處理過程和總結。

錯誤的含義

節點包含了非group內的事務。查看node3的數據,確實有與其他節點不一樣的數據。應該是沒啓動GR服務時,被當做獨立庫操作了。

處理方法

查閱資料,可以通過重置master信息,重新加入GR的方式解決。

  1. 查看當前Master狀態
mysql> show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000021
         Position: 246
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 14a7ed7d-7e42-11ea-add1-dc4a3e82bdf9:1-66725,
ba279ebc-9ac8-11ea-a379-a08cfdcc0735:1-250791:1000181-1000191
1 row in set (0.01 sec)
  1. 重置配置信息
    官方的描述:RESET MASTER enables you to delete any binary log files and their related binary log index file, returning the master to its state before binary logging was started.
    刪除了binlog,清空了master配置信息。
RESET MASTER

再看master狀態,日誌已經清空。

mysql> show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000001
         Position: 150
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
  1. 重新加入GR
CHANGE MASTER TO MASTER_USER='repl', MASTER_PASSWORD='password' 
FOR CHANNEL 'group_replication_recovery';
start group_replication;
  1. 看日誌已經啓動GR,並進行恢復
    Slave I/O thread for channel ‘group_replication_recovery’: connected to master ‘repl@s1:3306’,replication started in log ‘FIRST’ at position 4
2020-06-15T23:09:28.805095Z 2 [Note] Plugin group_replication reported: 'Group communication SSL configuration: group_replication_ssl_mode: "DISABLED"'
2020-06-15T23:09:28.805232Z 2 [Note] Plugin group_replication reported: '[GCS] Added automatically IP ranges 10.91.57.189/26,127.0.0.1/8 to the whitelist'
2020-06-15T23:09:28.805563Z 2 [Note] Plugin group_replication reported: '[GCS] Translated 's3' to 127.0.0.1'
2020-06-15T23:09:28.805713Z 2 [Warning] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the whitelist. It is mandatory that it is added.'
2020-06-15T23:09:28.805741Z 2 [Note] Plugin group_replication reported: '[GCS] SSL was not enabled'
2020-06-15T23:09:28.805752Z 2 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "ba279ebc-9ac8-11ea-a379-a08cfdcc0735"; group_replication_local_address: "s3:33061"; group_replication_group_seeds: "s1:33061,s2:33061,s3:33061"; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2020-06-15T23:09:28.805859Z 2 [Note] Plugin group_replication reported: '[GCS] Configured number of attempts to join: 0'
2020-06-15T23:09:28.805880Z 2 [Note] Plugin group_replication reported: '[GCS] Configured time between attempts to join: 5 seconds'
2020-06-15T23:09:28.805893Z 2 [Note] Plugin group_replication reported: 'Member configuration: member_id: 3; member_uuid: "14a7ed7d-7e42-11ea-add1-dc4a3e82bdf9"; single-primary mode: "true"; group_replication_auto_increment_increment: 7; '
2020-06-15T23:09:28.806102Z 185 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-06-15T23:09:28.949470Z 188 [Note] Slave SQL thread for channel 'group_replication_applier' initialized, starting replication in log 'FIRST' at position 0, relay log './s3-relay-bin-group_replication_applier.000001' position: 4
2020-06-15T23:09:28.949474Z 2 [Note] Plugin group_replication reported: 'Group Replication applier module successfully initialized!'
2020-06-15T23:09:28.949674Z 2 [Note] Plugin group_replication reported: 'auto_increment_increment is set to 7'
2020-06-15T23:09:28.949683Z 2 [Note] Plugin group_replication reported: 'auto_increment_offset is set to 3'
2020-06-15T23:09:28.949934Z 0 [Note] Plugin group_replication reported: 'XCom protocol version: 3'
2020-06-15T23:09:28.949962Z 0 [Note] Plugin group_replication reported: 'XCom initialized and ready to accept incoming connections on port 33061'
2020-06-15T23:09:30.817333Z 2 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address s2:3306.'
2020-06-15T23:09:30.817574Z 191 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2020-06-15T23:09:30.817687Z 0 [Note] Plugin group_replication reported: 'Group membership changed to s3:3306, s2:3306, s1:3306 on view 15901818388998341:33.'
2020-06-15T23:09:31.060257Z 191 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='s2', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='s1', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2020-06-15T23:09:31.235763Z 191 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor 9a44a006-7b43-11ea-a3eb-a08cfdc63e31 at s1 port: 3306.'
2020-06-15T23:09:31.236009Z 193 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-06-15T23:09:31.237547Z 193 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'repl@s1:3306',replication started in log 'FIRST' at position 4

查看GR狀態:

mysql> SELECT * FROM performance_schema.replication_group_members;

可以檢測到3個節點,但node3一直處於recovering。且過一會就會失敗斷開。

遇到的問題

  1. Error_code: 1050 Table ‘repl0523’ already exists
2020-06-15T19:16:28.282318Z 22 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Table 'repl0523' already exists' on query. Default database: 'test'. Query: 'create table repl0523 (id int,name varchar(30))', Error_code: 1050

表已經存在,刪除該表後繼續。
因爲已經是GR的從節點,默認是不能寫的,同時,也不能把事務發送到其他節點,影響正常數據。

關閉binlog    SET SQL_LOG_BIN=0;
修改全局參數   set global read_only=0;
刪除表       drop table repl0523
開啓binlog   SET SQL_LOG_BIN=1;  
重設master   
重啓         start group_replication;
  1. Multiple primary key defined
2020-06-15T20:14:40.801344Z 150 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Multiple primary key defined' on query. Default database: 'test'. Query: 'ALTER TABLE `test`.`test_record_table`

在表上要建多個主鍵。一個表肯定只能有一個主鍵,查看node3上,已經有主鍵了。查看binlog的內容,是又再建主鍵。刪除node3上表的主鍵後繼續。

  1. Error_code:1007 database exists
 2020-06-15T21:57:50.313727Z 165 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Can't create database 'test_db'; database exists' on query. Default database: 'test_db'. Query: '/* ApplicationName=DataGrip 2019.3.1 */ create schema TEST_DB', Error_code: 1007

數據庫已存在。刪除該庫後繼續。

總結

  1. resetmaster後,會當做新節點重新加入GR,GR創建後的事務會全部重做一遍。需要把之前殘留的數據刪除。
  2. 新加入節點,之前本地事務不受影響。本地事務就是導入node3脫離GR的原因。
  3. 有個疑問: 在線recovery太慢,怎麼把以後數據導入新節點後,在加入GR。後續再研究。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章