企業中MySQL高可用集羣架構三部曲之MM+keepalived

各位老鐵們，老張與大家又見面了。看到各位在博客裏面給我的留言和訪問量的情況，我很是欣慰，也謝謝大家對我的認可。我寫這些博客，就是想把自己對於MySQL數據庫的一些看法和自己平時的實戰經驗分享出來，我們可以一起探討，共同進步。也保證今後只要一有空就更新博文，推出更多的乾貨。

我的學生經常對我說：“張老師，每次我遇到報錯，有時還是會百度，但是最煩的是不知道百度哪篇帖子說的是正確的".其實這些呢，都是因爲自己還沒有對MySQL數據庫核心知識的不熟悉，和對技術掌握的不牢固。平時下得功夫還是不到位。我們做技術這個行業，還是需要自己給自己加發條，促使自己每天都要學習一些新的知識。理論配合實驗一起，先要學會多問自己幾個問題，一個實驗多做幾遍，可能會得到不同的實驗效果。學習知識要踏實下來，學會多做實驗總結。我想今後再遇到報錯，可能自己就會有一個清晰的解題思路，這個需要一定時間的磨練。

也有人經常問Oracle和MySQL到底有啥區別，其實MySQL數據庫上手很簡單，難的是後期架構的設計與維護。老張三部曲中第一部曲MHA希望對大家在線上部署方面有幫助。

今兒給大家介紹第二部曲，MM+keepalived的環境部署，我們會多種數據庫的架構就可以靈活應用到我們的公司。根據公司業務的不同，選擇合適的集羣架構。

獨家新課程上線>>MySQL體系結構深入剖析及實戰DBA視頻課程

MM+keepalived

簡介：

雙主配合keepalived這種架構設計，也是基於主從複製的原理而搭建的。

使用MySQL主主複製技術+Keepalived是一種簡單、便捷的解決方案，在高可用集羣環境中，keepalived使用VIP，使用Keepalived自帶的服務監控功能和自定義腳本來實現MySQL故障時自動切換，非常靈活。

應用範圍：

一般中小型公司都使用這種架構，搭建比較方便簡單；

可以採用主從或者主主模式，在 master 節點發生故障後，利用 keepalived 高可用機制實現快速切換到 slave 節點。原來的從庫變成新的主庫。

個人建議：

一定要完善好切換腳本，keepalived 的切換機制要合理，避免切換不成功的現象發生。
從庫的配置儘快要與主庫一致，不能太次；避免主庫宕機發生切換，新的主庫（原來的從庫）影響線上業務進行。
對於延遲的問題，在這套架構中，也不能避免。可以使用 mysql 5.7 中增強半同步完成。也可以改變架構使用 PXC，完成時時同步功能，基本上沒有延遲；
keepalived 無法解決腦裂的問題，因此在進行服務異常判斷時，可以修改我們的判斷腳本，通過對第三方節點補充檢測來決定是否進行切換，可降低腦裂問題產生的風險。
採用 keepalived 這個架構，在設置兩節點狀態時，都要設置成不搶佔模式，都是 backup 狀態，通過優先級，來決定誰是主庫。避免腦裂，衝突現象發生。
安裝好 mysql 需要的一些依賴包；建議配置好 yum 源，用 yum 安裝 keepalived 即可。

搭建架構之前理清思路：

首先需要裝有兩臺mysql的數據庫服務器，兩者之間互爲主從模式，都可讀寫。其實就只有一臺服務器A負責數據的寫入工作，而另一臺服務器B作爲我們的備用數據庫；
安裝keepalived的軟件包，個人建議使用yum安裝就可以，很方便。當然我們要知道yum安裝之後的軟件路徑在什麼位置。
整理好keepalived的配置文件，理清keepalived的三種狀態信息。還要準備一個監控MySQL的腳本，便於檢測到宕機順利發生切換的過程。
所有提供服務的從服務器與備用服務器B進行主從同步。----雙主從庫模式
在兩臺服務器A和B，當配置keepalived的參數文件中，要注意兩臺機器都要採用backup這種狀態，就是nopreempt這種非搶佔模式，避免出現衝突，發生腦裂現象。

架構圖展示：

實驗部署環境介紹：

192.168.56.100 主 ---master1

192.168.56.101 備庫---master2

都是乾淨環境沒有任何數據

VIP：192.168.56.111

MySQL數據庫版本5.7.14，採用GTID模式搭建主從環境

注意事項：

兩臺機器的防火牆必須是關閉狀態。
兩臺MySQL數據庫配置文件中server-id絕對不能一樣，要不會報1593這個主從同步的錯誤，導致搭建不成功。

實戰開始：

首先先要搭建兩臺MySQL數據庫爲互爲主從的架構模式。

添加主從同步賬號：

在192.168.56.100：

grant replication slave on *.* to 'bak'@'192.168.56.%' identified by '123456';

flush privileges;

在192.168.56.101：

grant replication slave on *.* to 'bak'@'192.168.56.%' identified by '123456';

flush privileges;

配置同步信息：

先在192.168.56.101上：

CHANGE MASTER TO MASTER_HOST='192.168.56.100',MASTER_USER='bak',MASTER_PASSWORD='123456',master_auto_position=1;

打開主從同步開關：

start slave；

查看主從同步狀態：

root@db 10:49:  [(none)]> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.56.100
                  Master_User: bak
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mybinlog.000005
          Read_Master_Log_Pos: 234
               Relay_Log_File: node2-relay-bin.000010
                Relay_Log_Pos: 365
        Relay_Master_Log_File: mybinlog.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 234
              Relay_Log_Space: 572
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3306100
                  Master_UUID: 913cc1d6-771a-11e7-8fe1-080027f1fd08
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 5a668126-771d-11e7-b14c-08002783b39d:1-2,
913cc1d6-771a-11e7-8fe1-080027f1fd08:1-13
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

從庫兩個工作的線程：IO，SQL都爲yes，代表同步搭建成功
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

再在192.168.56.100上：

配置主從同步信息：

CHANGE MASTER TO MASTER_HOST='192.168.56.101',MASTER_USER='bak',MASTER_PASSWORD='123456',master_auto_position=1;

開啓主從開關：

start slave；

查看主從同步狀態：

root@db 10:49:  [(none)]> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.56.101
                  Master_User: bak
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mybinlog.000005
          Read_Master_Log_Pos: 234
               Relay_Log_File: node1-relay-bin.000008
                Relay_Log_Pos: 365
        Relay_Master_Log_File: mybinlog.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 234
              Relay_Log_Space: 572
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3306101
                  Master_UUID: 5a668126-771d-11e7-b14c-08002783b39d
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 5a668126-771d-11e7-b14c-08002783b39d:1-2,
913cc1d6-771a-11e7-8fe1-080027f1fd08:1-13
                Auto_Position: 1
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

同理
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
從庫兩個工作的線程：IO，SQL都爲yes，代表同步搭建成功。

分別兩臺機器上安裝keepalived的軟件包，可以使用yum安裝方式

yum -y install keepalived;

Loaded plugins: fastestmirror, refresh-packagekit, security
Loading mirror speeds from cached hostfile
 * c6-media: 
file:///media/CentOS/repodata/repomd.xml: [Errno 14] Could not open/read file:///media/CentOS/repodata/repomd.xml
Trying other mirror.
file:///media/cdrecorder/repodata/repomd.xml: [Errno 14] Could not open/read file:///media/cdrecorder/repodata/repomd.xml
Trying other mirror.
c6-media                                                                                                       | 4.0 kB     00:00 ... 
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package keepalived.x86_64 0:1.2.7-3.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

======================================================================================================================================
 Package                          Arch                         Version                           Repository                      Size
======================================================================================================================================
Installing:
 keepalived                       x86_64                       1.2.7-3.el6                       c6-media                       174 k

Transaction Summary
======================================================================================================================================
Install       1 Package(s)

Total download size: 174 k
Installed size: 526 k
Downloading Packages:
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : keepalived-1.2.7-3.el6.x86_64                                                                                      1/1 
  Verifying  : keepalived-1.2.7-3.el6.x86_64                                                                                      1/1 

Installed:
  keepalived.x86_64 0:1.2.7-3.el6                                                                                                     

Complete!

證明keepalived軟件已經存在：

rpm -qa|grep keepalived
keepalived-1.2.7-3.el6.x86_64

在兩臺機器上分別配置檢測MySQL數據庫的腳本：

首先進入到yum安裝後的軟件目錄下：

cd /etc/keepalived
vim check_mysql.py

腳本通過查看MySQL進程的存在，和是否可以連接，來判斷MySQL的狀態

（這裏只展現了腳本中判斷的一部分）

    def existMySQL(self):
        cmd="ps -ef | egrep -i \"mysqld\" | grep %s | egrep -iv \"mysqld_safe\" | grep -v grep | wc -l" % self.dbport
        mysqldNum = os.popen(cmd).read()
        cmd ="netstat -tunlp | grep \":%s\" | wc -l" % self.dbport
        mysqlPortNum= os.popen(cmd).read()
        #print mysqldNum, mysqlPortNum
        if ( int(mysqldNum) <= 0):
            print "error"
            return 1
        if ( int(mysqldNum) > 0 and  mysqlPortNum <= 0):
            return 1
        return 0

    def connect(self):
    #    print "in db conn"
#        print "host : %s, port: %d, user: %s, password: %s" % (self.dbhost, self.dbport, self.dbuser, self.dbpassword)
        try:
            self.conn=MySQLdb.connect(host="%s"%self.dbhost, port=self.dbport,user="%s"%dbuser, passwd="%s"%self.dbpassword)
        except Exception, e:
#            print " Error"
            print e
            return 1
        return 0
    def disconnect(self):
        if (self.conn):
            self.conn.close()
            self.conn = None

在兩臺機器上修改keepalived的配置文件：

192.168.56.100上面

vim /etc/keepalived/keepalived.conf

vrrp_instance VI_100 {
    state BACKUP
    nopreempt
    interface eth0
    virtual_router_id 100
    priority 100
    advert_int 5
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_script {
        vs_mysql_100
    }
    virtual_ipaddress {
        192.168.56.111
    }
}

192.168.56.101上面：

vrrp_instance VI_101 {
    state BACKUP
    nopreempt
    interface eth0
    virtual_router_id 100
    priority 90
    advert_int 5
    authentication {
        auth_type PASS
        auth_pass 1111
    }
    track_script {
        vs_mysql_101
    }
    virtual_ipaddress {
        192.168.56.111
    }
}

總結：可以看到兩臺機器的state都是backup並且都是非搶佔模式nopreempt，通過優先級的高低來決定誰是主庫。（這裏192.168.56.100是主）還有注意virtual_router_id(虛擬路由id）要保持一致。

接下來可以啓動兩臺機器的keepalived進程：

192.168.56.100：

[root@node1 keepalived]# /etc/init.d/keepalived start
Starting keepalived:               [  OK  ]

觀察日誌中信息的變化：

cat /var/log/messages

Aug  3 10:52:17 node1 Keepalived_vrrp[6760]: Opening file '/etc/keepalived/keepalived.conf'.
Aug  3 10:52:17 node1 Keepalived_vrrp[6760]: Configuration is using : 62711 Bytes
Aug  3 10:52:17 node1 Keepalived_vrrp[6760]: Using LinkWatch kernel netlink reflector...
Aug  3 10:52:17 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Entering BACKUP STATE
Aug  3 10:52:17 node1 Keepalived_vrrp[6760]: VRRP sockpool: [ifindex(2), proto(112), fd(11,12)]
Aug  3 10:52:17 node1 Keepalived_healthcheckers[6759]: Opening file '/etc/keepalived/keepalived.conf'.
Aug  3 10:52:17 node1 Keepalived_healthcheckers[6759]: Configuration is using : 4431 Bytes
Aug  3 10:52:17 node1 Keepalived_healthcheckers[6759]: Using LinkWatch kernel netlink reflector...
Aug  3 10:52:18 node1 Keepalived_vrrp[6760]: VRRP_Script(vs_mysql_100) succeeded
Aug  3 10:52:33 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Transition to MASTER STATE
Aug  3 10:52:38 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Entering MASTER STATE
Aug  3 10:52:38 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) setting protocol VIPs.
Aug  3 10:52:38 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Sending gratuitous ARPs on eth0 for 192.168.56.111
Aug  3 10:52:38 node1 Keepalived_healthcheckers[6759]: Netlink reflector reports IP 192.168.56.111 added
Aug  3 10:52:43 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Sending gratuitous ARPs on eth0 for 192.168.56.111

可以看到它從backup狀態切換到master的狀態了，並且發送了一個廣播協議，證明192.168.56.111已經在本臺機器上面，其他機器不要再使用了。

[root@node1 keepalived]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:f1:fd:08 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.100/24 brd 192.168.56.255 scope global eth0
    inet 192.168.56.111/32 scope global eth0
    inet6 fe80::a00:27ff:fef1:fd08/64 scope link 
       valid_lft forever preferred_lft forever

192.168.56.101:

[root@node2 ~]# /etc/init.d/keepalived start
Starting keepalived:               [  OK  ]

cat /var/log/messages

Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Interface queue is empty
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Netlink reflector reports IP 192.168.56.101 added
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Netlink reflector reports IP fe80::a00:27ff:fe83:b39d added
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Registering Kernel netlink reflector
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Registering Kernel netlink command channel
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Opening file '/etc/keepalived/keepalived.conf'.
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Configuration is using : 4561 Bytes
Aug  3 10:53:28 node2 Keepalived_healthcheckers[8111]: Using LinkWatch kernel netlink reflector...
Aug  3 10:53:28 node2 Keepalived_vrrp[8112]: Opening file '/etc/keepalived/keepalived.conf'.
Aug  3 10:53:28 node2 Keepalived_vrrp[8112]: Configuration is using : 62577 Bytes
Aug  3 10:53:28 node2 Keepalived_vrrp[8112]: Using LinkWatch kernel netlink reflector...
Aug  3 10:53:28 node2 Keepalived_vrrp[8112]: VRRP_Instance(VI_101) Entering BACKUP STATE
Aug  3 10:53:28 node2 Keepalived_vrrp[8112]: VRRP sockpool: [ifindex(2), proto(112), fd(12,13)]
Aug  3 10:53:28 node2 Keepalived_vrrp[8112]: VRRP_Script(vs_mysql_101) succeeded

這臺服務器就是正常的backup狀態，時刻準備着接管主庫的服務。

模擬一下主庫宕機的故障切換：

主庫192.168.56.100上面執行關閉MySQL服務操作：

mysqladmin -uroot -proot123 shutdown

這時再查看一下keepalived日誌的情況：

cat /var/log/messages
Aug  3 11:04:17 node1 Keepalived_vrrp[6760]: VRRP_Script(vs_mysql_100) failed
Aug  3 11:04:18 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Entering FAULT STATE
Aug  3 11:04:18 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) removing protocol VIPs.
Aug  3 11:04:18 node1 Keepalived_healthcheckers[6759]: Netlink reflector reports IP 192.168.56.111 removed
Aug  3 11:04:18 node1 Keepalived_vrrp[6760]: VRRP_Instance(VI_100) Now in FAULT state

已經把vip removed了，並且keepalived的狀態變成了fault

在備庫192.168.56.101上面查看日誌：

cat /var/log/messages
Aug  3 11:04:29 node2 Keepalived_vrrp[8112]: VRRP_Instance(VI_101) Transition to MASTER STATE
Aug  3 11:04:34 node2 Keepalived_vrrp[8112]: VRRP_Instance(VI_101) Entering MASTER STATE
Aug  3 11:04:34 node2 Keepalived_vrrp[8112]: VRRP_Instance(VI_101) setting protocol VIPs.
Aug  3 11:04:34 node2 Keepalived_vrrp[8112]: VRRP_Instance(VI_101) Sending gratuitous ARPs on eth0 for 192.168.56.111
Aug  3 11:04:34 node2 Keepalived_healthcheckers[8111]: Netlink reflector reports IP 192.168.56.111 added
Aug  3 11:04:39 node2 Keepalived_vrrp[8112]: VRRP_Instance(VI_101) Sending gratuitous ARPs on eth0 for 192.168.56.111

備庫已經從backup狀態切換成master狀態了。並且VIP（192.168.56.111）已經切換過來。

查看ip地址：

[root@node2 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:83:b3:9d brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.101/24 brd 192.168.56.255 scope global eth0
    inet 192.168.56.111/32 scope global eth0
    inet6 fe80::a00:27ff:fe83:b39d/64 scope link 
       valid_lft forever preferred_lft forever

主備庫切換成功。

實戰演練過程結束，希望對大家學習MySQL高可用集羣有幫助。

今後我們可能還會遇到其他的MySQL高可用架構，學習它的時候，先不要忙於搭建，要先弄清原理，整理好實驗過程的思路，遇到報錯，一步步地去排查。自己的水平也會在這個歷練的過程中，得到提升。

今後我們可以一起討論，在寫博的過程中難免可能會有一些筆誤，或是想不周全的地方，希望大家諒解。有不對的地方歡迎各位老鐵指定。（MySQL高可用集羣第二部曲完結）

第一步曲直通車>> 企業中MySQL主流高可用架構實戰三部曲之MHA

企業中MySQL高可用集羣架構三部曲之MM+keepalived

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

記一次實驗mongodb （save，insert ）區別

記一次索引構建問題

mongodb 索引問題的介紹

記一次mongdb搭建複製集的小故障

沒有宮廷內鬥，數據庫界的延禧攻略

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結