heartbeat v1+NFS實現web高可用集羣（一）

第一部分：集羣中事務決策各層簡介

· Messageing Layer：心跳信息傳遞層

· ha_aware ：集羣事務決策軟件。自己能夠利用底層心跳信息傳遞層的功能，調用他的api完成事務決策的軟件。

· DC:Designated Coordinator：（選定的協調員）爲了防止主節點掛掉之後從節點爭搶。有多從節點推舉產生。

· CRM：（董事長）Cluster Resources Manager：負責做出決策。高可用集羣中，任何資源都不應該自行啓動，而是用CRM管理啓動與否。

· LRM：（總經理）local resources Manager;讓CRM的決策落實執行。真正管理本地資源，讓本地資源啓動停止，和狀態監控。

· RA:resource agent ：RA能夠接受CRM的調度用於實現在節點上對某一個資源進行管理的工具，這個工具通常就是腳本。任何一個資源配置都需要依賴一個腳本或者程序。（需要接受四個參數｛start|stop|status|restart｝ status：輸出狀態只能是running和stopped

failover：失效轉移，故障轉移

failback：失效轉回，故障轉回

每個層次中所用到的軟件

· Messaging Layer：

heartbeat v1，v2,v3

corosync（openAIS）：可用性事務委員會：定義開放的工業標準，爲了能夠讓其理念讓大家熟悉，openAIS推出了一個樣例模版。corosync

cman（紅帽）：集羣管理器。

· CRM:集羣資源管理器：只要接口和信息層兼容，就可以獨立使用。

heartbeat v1：haresour：配置文件；只是一個配置接口，

heartbeat v2:crm：(各節點都運行進程crmd；端口5566,客戶端crmsh（shell）用戶體驗差）：由於其不易配置，有人提供了接口heartbeat-GUI

heartbeat v3：heartbeat+pacemaker+ cluster-glue（黏合器）：

pacemaker：（獨立成了一個項目pacemaker）

配置接口:第三方開放配置工具，

CLI:命令行工具，crm（suse），pcs，---python語言研發

GUI:hawk（web界面），LCMC（窗口界面），pacemaker-mgmt

cman + rgmanager：

resource group manager：Failover Domanin：資源組管理器

配置接口：

RHCS：RedHat Cluster Suite

配置接口：Conga（完全生命週期的配置接口）安裝，配置，資源，啓動

· RA的類型。

heartbeat legacy：heartbeat的傳統類型

LSB：腳本/etc/rc.d/init.d/*

OCF:Open Cluster Framework：開放集羣框架
provider:pacemaker：資源代理腳本的組織
linbit ：提供的資源代理

STONITH:節點隔離

· keepalived:命令。輕量級，跟前邊的風格不同。藉助vrrp協議完成ip地址資源流轉，並利用自己內部的實現腳本調用的接口完成高可用功能。vrrp：虛擬路由冗餘協議。
應用場景：
keepalived+ipvs
keepalived+haproxy

· RHEL OR CentOS高可用集羣解決方案：
紅帽5：
　　　自帶：RHCS(camn+rgmanager）
　　　選用第三方：corosync+pacemaker，heartbeat（v1或v2），keepalived
紅帽6：
自帶：RHCS(cman+rgmanager)
corosync+rgmanager
cman+pacemaker
heartbeat v3 + pacemaker
　　　　 keepalived
應用方法：
　　　做前端負載均衡的高可用：keepalived
　　　做大規模的高可用集羣：corosync或（cman+pacemaker）支持多大100個節點

使用共享存儲的時候，爲了加快速度，存儲元數據會保存的內存中，如果有兩個集羣節點同時對數據寫操作，會導致文件系統崩潰。

實現資源隔離的方式：

可能需要藉助硬件；1、硬件芯片（需要認證機制）。2、切斷電源交換機。3、ssh

兩個節點的集羣是個特殊集羣，斷開的時候無法決策，此時可以使用ping node判斷。

仲裁設備：

1 ping node ping group

2 qdisk：（紅帽）選擇一個硬盤

備用節點獲取權限的時候爲了放至其他集羣訪問資源，造成文件系統崩潰。需要讓其他集羣節點徹底kill掉。

第二部分：具體配置佈置

1、確保兩個節點主機名和解析，時間一至。時間可以使用ntp服務器來統一，請自行安裝。在兩臺node上做相同配置

[root@node1 ~]# ntpdate 172.16.0.1
17 Apr 22:12:41 ntpdate[2079]: adjust time server 172.16.0.1 offset -0.001233 sec
[root@node1 ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=node1.nyist.com
[root@node1 ~]# uname -n
node1.nyist.com
[root@node1 ~]# vim /etc/hosts
172.16.20.31 node1.nyist.com node1
172.16.20.32 node2.nyist.com node2
[root@node1 ~]# ping node1
PING node1.syist.com (172.16.20.31) 56(84) bytes of data.
64 bytes from node1.syist.com (172.16.20.31): icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from node1.syist.com (172.16.20.31): icmp_seq=2 ttl=64 time=0.0

2、配置兩節點實現基於ssh的密鑰認證：目的是爲了方便使用ssh命令管理對方節點。簡化操作，當然，你要是有精力一次又一次的輸入祕密驗證，那此步驟也可以省略。（兩個節點同時做一下操作）

[root@node1 ~]# ssh-keygen -t rsa -P ‘’
[root@node1 ~]# ls .ssh/
authorized_keys  id_rsa  id_rsa.pub
[root@node1 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]
[root@node1 ~]# ssh node2
Last login: Thu Apr 17 22:24:51 2014 from 172.16.20.55
[root@node2 ~]#

3、Ok開始安裝程序吧、（雙節的都要安裝）

[root@node1 heartbeat2]# yum install perl-TimeDate PyXML libnet net-snmp-libs

首先用yum先解決一些依賴。然後用rpm安裝，注意，安裝heartbeat的時候不要用yum，因爲用yum安裝會因爲版本問題替換掉heartbeat所依賴的一下包。
[root@node1 heartbeat2]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm

4、安裝好了讓我們來配置吧。安裝好heartbeat後，並沒有生成配置文件，但是heartbeat提供了配置文件樣本，所以需要我們將其copy過來。注意authkey的訪問權限需要是600，因爲它是跟安全相關的。

[root@node1 ha.d]# cp /usr/share/doc/heartbeat-2.1.4/authkeys /etc/ha.d
[root@node1 ha.d]# cp /usr/share/doc/heartbeat-2.1.4/ha.cf /etc/ha.d
[root@node1 ha.d]# cp /usr/share/doc/heartbeat-2.1.4/haresources /etc/ha.d

5、修改配置文件參數時其適用你的要求

[root@node1 ha.d]# vim authkeys
auth 2
#1 crc
2 sha1 redhat
#3 md5 Hello!
[root@node1 ha.d]# chmod 600 authkeys
[root@node1 ha.d]# vim ha.cf--------主配置文件
#debugfile /var/log/ha-debug------調試的時候開啓用
  File to write other messages to
#
logfile /var/log/ha-log---------------日誌記錄位置
#logfacility    local0--------------也可以local7收日誌
keepalive 1500ms-------------------心跳檢測時間間隔
#
#       deadtime: how long-to-declare-host-dead?
deadtime 6--------------------確認死亡時間
warntime 3-------------------節點掉線警告
#       serial  serialportname ...---串行心跳線時候使用的
#serial /dev/ttyS0      # Linux
#serial /dev/cuaa0      # FreeBSD
#serial /dev/cuad0      # FreeBSD 6.x
#serial /dev/cua/a      # Solaris
mcast eth0 225.0.100.1 694 1 0---------多播方式通知
auto_failback on-------------------是否實現自動轉回。
#stonith baytech /etc/ha.d/conf/stonith.baytech-------stonith設備，有設備了可以啓用
node    node1.nyist.com-------------------標識節點個數
node    node2.nyist.com
#debug 1------------------------------------調試級別
#compression    bz2-------------壓縮
#compression_threshold 2

分別給兩個節點提供兩個簡單的驗證頁面然後驗證httpd是否正常服務。切記不能讓服務開機自動啓動

[root@node1 ha.d]# curl node1
<h1>172.16.20.31@@@node1</h1>
[root@node1 ha.d]# curl node2
curl: (7) couldn't connect to host
[root@node1 ha.d]# curl node2
<h1>172.16.20.32@@@node2</h1>
[root@node1 ha.d]# chkconfig httpd off
[root@node1 ha.d]# ssh node2 'chkconfig httpd off'

6、定義資源管理器開始提供服務。

[root@node1 ha.d]# vim haresources
node1.nyist.com 172.16.20.100/16/eth0 httpd
定義虛擬IP:172.16.20.100

（定義資源：1.ip，2.httpd服務）

Ip:會自動去/etc/ha.d/resource.d目錄下找IP定義的腳本定義IP

定義的Httpd服務也會去次目錄還有/etc/rc.d/init.d/目錄下找相應的服務

[root@node1 ha.d]# ls /etc/ha.d/resource.d/

apache IPaddr OCF

AudibleAlarm IPaddr2 portblock

db2 IPsrcaddr Raid1

Delay IPv6addr SendArp

Filesystem LinuxSCSI ServeRAID

hto-mapfuncs LVM WAS

ICP LVSSyncDaemonSwap WinPopup

ids MailTo Xinetd

Cp到節點二

[root@node1 ha.d]# scp ha.cf authkeys haresources root@node2:/etc/ha.d

7、啓動服務

[root@node1 ~]# service heartbeat start
logd is already running
Starting High-Availability services:
2014/04/17_23:46:35 INFO:  Resource is stopped
2014/04/17_23:46:35 INFO:  Resource is stopped
Done.
[root@node1 ~]# ssh node2 'service heartbeat start'
[root@node1 ha.d]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0C:29:02:06:8E
          inet addr:172.16.20.31  Bcast:172.16.255.255  Mask:255.255.0.0
          inet6 addr: fe80::20c:29ff:fe02:68e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:33997 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6394 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:24361024 (23.2 MiB)  TX bytes:777306 (759.0 KiB)
eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:02:06:8E
          inet addr:172.16.20.100  Bcast:172.16.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:78 errors:0 dropped:0 overruns:0 frame:0
          TX packets:78 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:9562 (9.3 KiB)  TX bytes:9562 (9.3 KiB)

8、查看日誌可以看出，有一個節點嘗試連接，時間超時後自動KILL掉

[root@node1 ~]# tail /var/log/ha-log -f
.heartbeat[2600]: 2014/04/18_18:41:12 info: **************************
heartbeat[2600]: 2014/04/18_18:41:12 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat[2601]: 2014/04/18_18:41:12 info: heartbeat: version 2.1.4
heartbeat[2601]: 2014/04/18_18:41:12 info: Heartbeat generation: 1397817331
heartbeat[2601]: 2014/04/18_18:41:12 info: glib: UDP multicast heartbeat started for group 228.15.100.1 port 694 interface eth0 (ttl=1 loop=0)
heartbeat[2601]: 2014/04/18_18:41:12 info: glib: ping heartbeat started.
heartbeat[2601]: 2014/04/18_18:41:12 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[2601]: 2014/04/18_18:41:12 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[2601]: 2014/04/18_18:41:12 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[2601]: 2014/04/18_18:41:12 info: Local status now set to: 'up'
heartbeat[2601]: 2014/04/18_18:41:13 info: Link 172.16.0.1:172.16.0.1 up.
heartbeat[2601]: 2014/04/18_18:41:13 info: Status update for node 172.16.0.1: status ping
heartbeat[2601]: 2014/04/18_18:41:31 WARN: node node2.nyist.com: is dead
heartbeat[2601]: 2014/04/18_18:41:31 info: Comm_now_up(): updating status to active
heartbeat[2601]: 2014/04/18_18:41:31 info: Local status now set to: 'active'
heartbeat[2601]: 2014/04/18_18:41:31 WARN: No STONITH device configured.
heartbeat[2601]: 2014/04/18_18:41:31 WARN: Shared disks are not protected.
heartbeat[2601]: 2014/04/18_18:41:31 info: Resources being acquired from node2.nyist.com.
harc[2613]:2014/04/18_18:41:31 info: Running /etc/ha.d/rc.d/status status
mach_down[2644]:2014/04/18_18:41:31 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[2644]:2014/04/18_18:41:31 info: mach_down takeover complete for node node2.nyist.com.
heartbeat[2601]: 2014/04/18_18:41:31 info: mach_down takeover complete.
heartbeat[2601]: 2014/04/18_18:41:31 info: Initial resource acquisition complete (mach_down)
IPaddr[2687]:2014/04/18_18:41:31 INFO:  Resource is stopped
heartbeat[2614]: 2014/04/18_18:41:31 info: Local Resource acquisition completed.
harc[2747]:2014/04/18_18:41:31 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[2747]:2014/04/18_18:41:31 received ip-request-resp 172.16.20.100/16/eth0 OK yes
ResourceManager[2766]:2014/04/18_18:41:31 info: Acquiring resource group: node1.nyist.com 172.16.20.100/16/eth0 httpd
IPaddr[2792]:2014/04/18_18:41:31 INFO:  Resource is stopped
ResourceManager[2766]:2014/04/18_18:41:31 info: Running /etc/ha.d/resource.d/IPaddr 172.16.20.100/16/eth0 start
IPaddr[2889]:2014/04/18_18:41:31 INFO: Using calculated netmask for 172.16.20.100: 255.255.0.0
IPaddr[2889]:2014/04/18_18:41:31 INFO: eval ifconfig eth0:0 172.16.20.100 netmask 255.255.0.0 broadcast 172.16.255.255
IPaddr[2860]:2014/04/18_18:41:31 INFO:  Success
ResourceManager[2766]:2014/04/18_18:41:31 info: Running /etc/init.d/httpd  start
heartbeat[2601]: 2014/04/18_18:41:41 info: Local Resource acquisition completed. (none)
heartbeat[2601]: 2014/04/18_18:41:41 info: local resource transition completed.
heartbeat[2601]: 2014/04/18_18:43:42 info: Link node2.nyist.com:eth0 up.
heartbeat[2601]: 2014/04/18_18:43:42 info: Status update for node node2.nyist.com: status init
heartbeat[2601]: 2014/04/18_18:43:42 info: Status update for node node2.nyist.com: status up
harc[3034]:2014/04/18_18:43:42 info: Running /etc/ha.d/rc.d/status status
harc[3050]:2014/04/18_18:43:42 info: Running /etc/ha.d/rc.d/status status
heartbeat[2601]: 2014/04/18_18:43:43 info: Status update for node node2.nyist.com: status active
harc[3065]:2014/04/18_18:43:43 info: Running /etc/ha.d/rc.d/status status
heartbeat[2601]: 2014/04/18_18:43:44 info: remote resource transition completed.
heartbeat[2601]: 2014/04/18_18:43:44 info: node1.nyist.com wants to go standby [foreign]
heartbeat[2601]: 2014/04/18_18:43:44 info: standby: node2.nyist.com can take our foreign resources
heartbeat[3080]: 2014/04/18_18:43:44 info: give up foreign HA resources (standby).
heartbeat[3080]: 2014/04/18_18:43:45 info: foreign HA resource release completed (standby).
heartbeat[2601]: 2014/04/18_18:43:45 info: Local standby process completed [foreign].
heartbeat[2601]: 2014/04/18_18:43:45 WARN: 1 lost packet(s) for [node2.nyist.com] [11:13]
heartbeat[2601]: 2014/04/18_18:43:45 info: remote resource transition completed.
heartbeat[2601]: 2014/04/18_18:43:45 info: No pkts missing from node2.nyist.com!
heartbeat[2601]: 2014/04/18_18:43:45 info: Other node completed standby takeover of foreign resources.
Ourselves
[root@node2 ha.d]# tail /var/log/ha-log
heartbeat[17427]: 2014/04/18_18:43:41 info: **************************
heartbeat[17427]: 2014/04/18_18:43:41 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat[17428]: 2014/04/18_18:43:41 info: heartbeat: version 2.1.4
heartbeat[17428]: 2014/04/18_18:43:41 info: Heartbeat generation: 1397749464
heartbeat[17428]: 2014/04/18_18:43:41 info: glib: UDP multicast heartbeat started for group 228.15.100.1 port 694 interface eth0 (ttl=1 loop=0)
heartbeat[17428]: 2014/04/18_18:43:41 info: glib: ping heartbeat started.
heartbeat[17428]: 2014/04/18_18:43:41 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[17428]: 2014/04/18_18:43:41 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat[17428]: 2014/04/18_18:43:41 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat[17428]: 2014/04/18_18:43:42 info: Local status now set to: 'up'
heartbeat[17428]: 2014/04/18_18:43:42 info: Link 172.16.0.1:172.16.0.1 up.
heartbeat[17428]: 2014/04/18_18:43:42 info: Status update for node 172.16.0.1: status ping
heartbeat[17428]: 2014/04/18_18:43:43 info: Link node1.nyist.com:eth0 up.
heartbeat[17428]: 2014/04/18_18:43:43 info: Status update for node node1.nyist.com: status active
harc[17440]:2014/04/18_18:43:43 info: Running /etc/ha.d/rc.d/status status
heartbeat[17428]: 2014/04/18_18:43:43 info: Comm_now_up(): updating status to active
heartbeat[17428]: 2014/04/18_18:43:43 info: Local status now set to: 'active'
heartbeat[17428]: 2014/04/18_18:43:44 info: remote resource transition completed.
heartbeat[17428]: 2014/04/18_18:43:44 info: remote resource transition completed.
heartbeat[17428]: 2014/04/18_18:43:44 info: Local Resource acquisition completed. (none)
heartbeat[17428]: 2014/04/18_18:43:44 info: node1.nyist.com wants to go standby [foreign]
heartbeat[17428]: 2014/04/18_18:43:45 info: standby: acquire [foreign] resources from node1.nyist.com
heartbeat[17458]: 2014/04/18_18:43:45 info: acquire local HA resources (standby).
heartbeat[17458]: 2014/04/18_18:43:45 info: local HA resource acquisition completed (standby).
heartbeat[17428]: 2014/04/18_18:43:45 info: Standby resource acquisition done [foreign].
heartbeat[17428]: 2014/04/18_18:43:45 info: Initial resource acquisition complete (auto_failback)
heartbeat[17428]: 2014/04/18_18:43:45 info: remote resource transition completed.

9、用瀏覽器驗證故障專業效果。

關閉node1節點的時候轉移到node2上。重新打開node1後，轉移到了node1.因爲我們設置了資源優先級。

、

使用nfs創建共享存儲：讓web使用（假如不允許同時掛載）

創建提供NFS的服務器

[root@My2 ~]# mkdir /www/htdoc -p
[root@My2 ~]# setfacl -m u：apache:rwx /www/htdoc/
[root@My2 ~]# vim /etc/exports
/www/htdoc 172.16.0.0/16 (rw)
[root@My2 ~]# vim /www/htdoc/index.html
<h1>from NFS server</h1>
[root@My2 ~]# servie nfs start

去各節點編輯資源：

[root@node1 ha.d]# vim haresources
node1.nyist.com 172.16.20.100/16/eth0 Filesystem::172.16.20.32:/www/htdoc::/var/www/html::nfs httpd
定義了三個資源有先後順序之分
    172.16.20.100/16/eth0
    172.16.20.32:/www/htdoc
    httpd

重啓heartbeat服務即可見成果，測試，停掉一臺node後，不僅vip回收了，而且掛載的NFS也自動卸載掉了，放置對數據資源的佔有，

故障轉移的日誌信息

IPaddr[22187]:2014/04/18_22:10:40 INFO:  Resource is stopped
ResourceManager[22161]:2014/04/18_22:10:40 info: Running /etc/ha.d/resource.d/IPaddr 172.16.20.100/16/eth0 start
IPaddr[22284]:2014/04/18_22:10:40 INFO: Using calculated netmask for 172.16.20.100: 255.255.0.0
IPaddr[22284]:2014/04/18_22:10:40 INFO: eval ifconfig eth0:0 172.16.20.100 netmask 255.255.0.0 broadcast 172.16.255.255
IPaddr[22255]:2014/04/18_22:10:40 INFO:  Success
Filesystem[22385]:2014/04/18_22:10:40 INFO:  Resource is stopped
ResourceManager[22161]:2014/04/18_22:10:40 info: Running /etc/ha.d/resource.d/Filesystem 172.16.20.62:/www/htdoc /var/www/html nfs start
Filesystem[22463]:2014/04/18_22:10:41 INFO: Running start for 172.16.20.62:/www/htdoc on /var/www/html
Filesystem[22452]:2014/04/18_22:10:41 INFO:  Success
ResourceManager[22161]:2014/04/18_22:10:41 info: Running /etc/init.d/httpd  start
mach_down[22136]:2014/04/18_22:10:41 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[22136]:2014/04/18_22:10:41 info: mach_down takeover complete for node node1.nyist.com.
heartbeat[22097]: 2014/04/18_22:10:41 info: mach_down takeover complete.
heartbeat[22097]: 2014/04/18_22:10:41 info: Initial resource acquisition complete (mach_down)
heartbeat[22097]: 2014/04/18_22:10:51 info: Local Resource acquisition completed. (none)
heartbeat[22097]: 2014/04/18_22:10:51 info: local resource transition completed.

測試頁面

heartbeat v1+NFS實現web高可用集羣（一）

工作中用到的腳本合集

24-5-18 X

我的友情鏈接

heartbeat v2 實現HA (二）

shell基礎筆記

grep與正則表達式

drbd+corosync+pacemaker實現web應用的高可用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結