MariaDB高可用架構之MHA

MHA(Master High Availability)

該軟件由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。MHA Manager可以單獨部署在一臺獨立的機器上管理多個master-slave集羣,也可以部署在一臺slave節點上。MHA Node運行在每臺MySQL服務器上,MHA Manager會定時探測集羣中的master節點,當master出現故障時,它可以自動將最新數據的slave提升爲新的master,然後將所有其他的slave重新指向新的master。整個故障轉移過程對應用程序完全透明。

在MHA自動故障切換過程中,MHA試圖從宕機的主服務器上保存二進制日誌,最大程度的保證數據的不丟失,但這並不總是可行的。例如,如果主服務器硬件故障或無法通過ssh訪問,MHA沒法保存二進制日誌,只進行故障轉移而丟失了最新的數據。使用MySQL 5.5的半同步複製,可以大大降低數據丟失的風險。MHA可以與半同步複製結合起來。如果只有一個slave已經收到了最新的二進制日誌,MHA可以將最新的二進制日誌應用於其他所有的slave服務器上,因此可以保證所有節點的數據一致性。

目前MHA主要支持一主多從的架構,要搭建MHA,要求一個複製集羣中必須最少有三臺數據庫服務器,一主二從.

官方介紹:https://code.google.com/p/mysql-master-ha/


配置MHA實現mariadb 主從切換

4臺Centos7.6虛擬機 數據庫版本 10.2.23-MariaDB
IP
角色
192.168.148.7 master
192.168.148.27 slave1
192.168.148.37 slave2
192.168.148.47 mha-manager

前期準備

  • 完成3臺數據庫服務器的半同步複製
  • 3臺數據庫服務器安裝mha4mysql-node
  • 管理節點安裝 mha4mysql-managermh、a4mysql-node (需要配置EPEL源)
  • 完成4臺服務器的ssh基於公鑰的登錄
主節點配置一個管理賬號用戶 mha的遠程管理
MariaDB [(none)]> grant all on *.* to mhauser@'192.168.148.%' identified by 'centos';
從節點開啓二進制日誌
vim /etc/mysql/my.cnf
log-bin=/data/bin/mysql-bin
relay_log_purge=0 #不清除中繼日誌
skip_name_resolve=1 #跳過域名解析
管理節點編寫配置文件
[root@localhost ~]# vim /etc/mastermha/app1.cnf 
[server default]
user=mhauser
password=centos
manager_workdir=/opt/mastermha/app1
manager_log=/opt/mastermha/app1/manager.log
remote_workdir=/opt/mastermha/app1
ssh_user=root
repl_user=repluser
repl_password=centos
ping_interval=1
master_binlog_dir=/data/bin/  #指定二進制日誌的存放路徑

[server1]
hostname=192.168.148.7
candidate_master=1

[server2]
hostname=192.168.148.27
candidate_master=1

[server3]
hostname=192.168.148.37
candidate_master=1
管理節點運行測試腳本
# ssh 公鑰登錄驗證
[root@localhost ~]# masterha_check_ssh --conf=/etc/mastermha/app1.cnf 
Fri May 10 10:11:33 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 10 10:11:33 2019 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Fri May 10 10:11:33 2019 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Fri May 10 10:11:33 2019 - [info] Starting SSH connection tests..
Fri May 10 10:11:35 2019 - [debug] 
Fri May 10 10:11:33 2019 - [debug]  Connecting via SSH from [email protected](192.168.148.7:22) to [email protected](192.168.148.27:22)..
Fri May 10 10:11:34 2019 - [debug]   ok.
Fri May 10 10:11:34 2019 - [debug]  Connecting via SSH from [email protected](192.168.148.7:22) to [email protected](192.168.148.37:22)..
Fri May 10 10:11:34 2019 - [debug]   ok.
Fri May 10 10:11:36 2019 - [debug] 
Fri May 10 10:11:34 2019 - [debug]  Connecting via SSH from [email protected](192.168.148.37:22) to [email protected](192.168.148.7:22)..
Fri May 10 10:11:35 2019 - [debug]   ok.
Fri May 10 10:11:35 2019 - [debug]  Connecting via SSH from [email protected](192.168.148.37:22) to [email protected](192.168.148.27:22)..
Fri May 10 10:11:35 2019 - [debug]   ok.
Fri May 10 10:11:36 2019 - [debug] 
Fri May 10 10:11:34 2019 - [debug]  Connecting via SSH from [email protected](192.168.148.27:22) to [email protected](192.168.148.7:22)..
Fri May 10 10:11:34 2019 - [debug]   ok.
Fri May 10 10:11:34 2019 - [debug]  Connecting via SSH from [email protected](192.168.148.27:22) to [email protected](192.168.148.37:22)..
Fri May 10 10:11:35 2019 - [debug]   ok.
Fri May 10 10:11:36 2019 - [info] All SSH connection tests passed successfully.

# 數據庫主從複製測試 如果結果不是 "OK" 請檢測主從複製配置
[root@localhost ~]# masterha_check_repl --conf=/etc/mastermha/app1.cnf 
Fri May 10 10:12:12 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri May 10 10:12:12 2019 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Fri May 10 10:12:12 2019 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Fri May 10 10:12:12 2019 - [info] MHA::MasterMonitor version 0.56.
Fri May 10 10:12:13 2019 - [info] GTID failover mode = 0
Fri May 10 10:12:13 2019 - [info] Dead Servers:
Fri May 10 10:12:13 2019 - [info] Alive Servers:
Fri May 10 10:12:13 2019 - [info]   192.168.148.7(192.168.148.7:3306)
Fri May 10 10:12:13 2019 - [info]   192.168.148.27(192.168.148.27:3306)
Fri May 10 10:12:13 2019 - [info]   192.168.148.37(192.168.148.37:3306)
Fri May 10 10:12:13 2019 - [info] Alive Slaves:
Fri May 10 10:12:13 2019 - [info]   192.168.148.27(192.168.148.27:3306)  Version=10.2.23-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri May 10 10:12:13 2019 - [info]     Replicating from 192.168.148.7(192.168.148.7:3306)
Fri May 10 10:12:13 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri May 10 10:12:13 2019 - [info]   192.168.148.37(192.168.148.37:3306)  Version=10.2.23-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri May 10 10:12:13 2019 - [info]     Replicating from 192.168.148.7(192.168.148.7:3306)
Fri May 10 10:12:13 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri May 10 10:12:13 2019 - [info] Current Alive Master: 192.168.148.7(192.168.148.7:3306)
Fri May 10 10:12:13 2019 - [info] Checking slave configurations..
Fri May 10 10:12:13 2019 - [info] Checking replication filtering settings..
Fri May 10 10:12:13 2019 - [info]  binlog_do_db= , binlog_ignore_db= 
Fri May 10 10:12:13 2019 - [info]  Replication filtering check ok.
Fri May 10 10:12:13 2019 - [info] GTID (with auto-pos) is not supported
Fri May 10 10:12:13 2019 - [info] Starting SSH connection tests..
Fri May 10 10:12:15 2019 - [info] All SSH connection tests passed successfully.
Fri May 10 10:12:15 2019 - [info] Checking MHA Node version..
Fri May 10 10:12:16 2019 - [info]  Version check ok.
Fri May 10 10:12:16 2019 - [info] Checking SSH publickey authentication settings on the current master..
Fri May 10 10:12:16 2019 - [info] HealthCheck: SSH to 192.168.148.7 is reachable.
Fri May 10 10:12:17 2019 - [info] Master MHA Node version is 0.56.
Fri May 10 10:12:17 2019 - [info] Checking recovery script configurations on 192.168.148.7(192.168.148.7:3306)..
Fri May 10 10:12:17 2019 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/bin/ --output_file=/opt/mastermha/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000001 
Fri May 10 10:12:17 2019 - [info]   Connecting to [email protected](192.168.148.7:22).. 
  Creating /opt/mastermha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/bin/, up to master-bin.000001
Fri May 10 10:12:17 2019 - [info] Binlog setting check done.
Fri May 10 10:12:17 2019 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri May 10 10:12:17 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=192.168.148.27 --slave_ip=192.168.148.27 --slave_port=3306 --workdir=/opt/mastermha/app1 --target_version=10.2.23-MariaDB-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
Fri May 10 10:12:17 2019 - [info]   Connecting to [email protected](192.168.148.27:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to localhost-relay-bin.000002
    Temporary relay log file is /data/mysql/localhost-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri May 10 10:12:18 2019 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=192.168.148.37 --slave_ip=192.168.148.37 --slave_port=3306 --workdir=/opt/mastermha/app1 --target_version=10.2.23-MariaDB-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info  --relay_dir=/data/mysql/  --slave_pass=xxx
Fri May 10 10:12:18 2019 - [info]   Connecting to [email protected](192.168.148.37:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/relay-log.info ... ok.
    Relay log found at /data/mysql, up to localhost-relay-bin.000002
    Temporary relay log file is /data/mysql/localhost-relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri May 10 10:12:18 2019 - [info] Slaves settings check done.
Fri May 10 10:12:18 2019 - [info] 
192.168.148.7(192.168.148.7:3306) (current master)
 +--192.168.148.27(192.168.148.27:3306)
 +--192.168.148.37(192.168.148.37:3306)

Fri May 10 10:12:18 2019 - [info] Checking replication health on 192.168.148.27..
Fri May 10 10:12:18 2019 - [info]  ok.
Fri May 10 10:12:18 2019 - [info] Checking replication health on 192.168.148.37..
Fri May 10 10:12:18 2019 - [info]  ok.
Fri May 10 10:12:18 2019 - [warning] master_ip_failover_script is not defined.
Fri May 10 10:12:18 2019 - [warning] shutdown_script is not defined.
Fri May 10 10:12:18 2019 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

# 啓動管理程序,如果主數據庫故障會進行主從切換
[root@localhost ~]# masterha_manager --conf=/etc/mastermha/app1.cnf 
手工kill主數據庫進程查看切換效果
[root@node1 ~]# ps -ef | grep mysqld
root       9774      1  0 10:21 ?        00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql --pid-file=/data/mysql/node1.localdomain.pid
mysql      9936   9774  0 10:21 ?        00:00:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysql/node1.localdomain.err --pid-file=/data/mysql/node1.localdomain.pid --socket=/tmp/mysql.sock --port=3306

[root@node1 ~]# kill -9 9774
[root@node1 ~]# kill -9 9936

# 監控節點的日誌顯示 主庫切換
[root@localhost mastermha]# tail /opt/mastermha/app1/manager.log 

Started automated(non-interactive) failover.
The latest slave 192.168.148.27(192.168.148.27:3306) has all relay logs for recovery.
Selected 192.168.148.27(192.168.148.27:3306) as a new master.
192.168.148.27(192.168.148.27:3306): OK: Applying all logs succeeded.
192.168.148.37(192.168.148.37:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.148.37(192.168.148.37:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.148.27(192.168.148.27:3306)
192.168.148.27(192.168.148.27:3306): Resetting slave info succeeded.
Master failover to 192.168.148.27(192.168.148.27:3306) completed successfully.

# 192.168.148.27 查看信息
MariaDB [mysql]> show slave hosts;
+-----------+------+------+-----------+
| Server_id | Host | Port | Master_id |
+-----------+------+------+-----------+
|        37 |      | 3306 |        27 |
+-----------+------+------+-----------+

# 192.168.148.37 查看信息
MariaDB [mysql]> show slave status \G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.148.27
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章