基本拓撲:
兩臺高可用節點:
node1:192.168.191.112
node2:192.168.191.113
NFS服務器:192.168.191.111
web服務的流動IP:192.168.191.199
一、準備工作:
1).node1---node2 基於主機名通信
1.編輯/etc/hosts文件添加如下內容
192.168.191.112 node1.liaobin.com node1
192.168.191.113 node2.liaobin.com node2
2.編輯/etc/system/network文件分別修改主機名爲node1.liaobin.com和node2.liaobin.com
3.重啓
2). 時間同步,用ntpd服務器(爲了圖個方便我試用了data命令將兩個節點時間改爲一樣)
# date -s 11:11:11
3).node1---node2 基於ssh免密碼登陸
node1:
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
node2:
# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
3).安裝 corosync pacemaker(yum源指向CD1即可)
# yum install -y corosync pacemaker
二、配置corosync(node1上進行)
1).複製配置模板爲配置文件
# cd /etc/corosync
# cp corosync.conf.example corosync.conf
2).編輯/etc/corosync/corosync.conf(只列出需要改變的配置,以及添加的配置)
————————————修改————————————————
secauth: on #開啓加密功能(若開啓,則需要使用corosync-keygen命令生成密鑰)
bindnetaddr: 192.168.191.0 #設置網絡地址,切記是網絡地址
mcastaddr: 239.25.11.12 #設置多播地址用於傳輸心跳信息
to_logfile: yes #使用本機文件記錄日誌
logfile: /var/log/cluster/corosync.log #指明日誌文件位置
to_syslog: no #關閉rsyslog日誌
————————————添加————————————————
#pacemaker以corosync的插件方式運行,跟隨corosync啓動而啓動
service {
ver: 0
name: pacemaker
# use_mgmtd: yes #以守護進程方式運行,貌似沒用,可有可無此項
}
#可有可無的配置,以root用戶運行
aisexec {
user: root
group: root
}
3).運行corosync-keygen命令生成密鑰文件authkey(直接運行即可)
#corosync-keygen
4).複製corosync配置見和authkey給另一個節點node2
# cd /etc/corosync/
# scp corosync.conf authkey node2:/etc/corosync
三、測試corosync能否成功啓動(兩個節點node1,node2都要做測試)
1).查看corosync引擎是否正常啓動:
[root@node1 ~]# service corosync start; ssh node2 'service corosync start'
[root@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Mar 26 21:30:29 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Mar 26 21:30:29 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 26 21:31:06 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:2055.
2).查看初始化成員節點通知是否正常發出:
[root@node1 ~]# grep TOTEM /var/log/cluster/corosync.log
Mar 26 21:30:29 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Mar 26 21:30:29 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Mar 26 21:30:29 corosync [TOTEM ] The network interface [192.168.191.112] is now up.
3).檢查啓動過程中是否有錯誤產生。下面的錯誤信息表示packmaker不久之後將不再作爲corosync的插件運行,因此,建議使用cman作爲集羣基礎架構服務;此處可安全忽略。
[root@node1 ~]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
Mar 26 15:41:56 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Mar 26 15:41:56 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
4).查看pacemaker是否正常啓動:
[root@node1 ~]# grep pcmk_startup /var/log/cluster/corosync.log
Mar 26 15:41:56 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Mar 26 15:41:56 corosync [pcmk ] Logging: Initialized pcmk_startup
Mar 26 15:41:56 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Mar 26 15:41:56 corosync [pcmk ] info: pcmk_startup: Service: 9
Mar 26 15:41:56 corosync [pcmk ] info: pcmk_startup: Local hostname: node1.liaobin.com
四、安裝crmsh(兩個節點都安裝,方便查看狀態)
注意:crmsh依賴於pssh,因此需要一併下載。
程序版本:pssh-2.3.1-2.el6.x86_64.rpm,crmsh-2.1-1.6.x86_64.rpm
1).安裝:
#yum -y --nogpgcheck localinstall crmsh*.rpm pssh*.rpm
2).查看節點狀態:
[root@node1 ~]# crm status
Last updated: Thu Mar 26 21:45:07 2015
Last change: Thu Mar 26 17:21:29 2015
Stack: classic openais (with plugin)
Current DC: node2.liaobin.com - partition with quorum 說明DC爲node2
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.liaobin.com node2.liaobin.com ] 說明node1 node2都已經上線
五、配置nfs服務以及node1,node2的httpd服務
1).nfs服務配置:
# mkdir /shared
# echo "/shared 192.168.191.*(rw)" >> /etc/exports
# service nfs restart
2).node1配置:
# echo " nfs">/var/www/html/index.html
# chkconfig httpd off
# service httpd stop
3).node2配置:
# echo "nfs">/var/www/html/index.html
# chkconfig httpd off
# service httpd stop
六、配置集羣(node1上操作)
1).關閉stonith設備,此默認配置目前尚不可用
驗證:(若出現下列錯誤消息,則需要關閉stonith設備)
# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-V may provide more details
關閉:
# crm configure property stonith-enabled=false
#no-quorum-policy=ignore # 當只有兩個節點時需要設置。兩節點以上時不要進行設置。
2).查看當前配置信息:
# crm configure show
3).開始添加資源
[root@node1 ~]# crm
進入配置模式
crm(live)# configure
配置IP地址檢測間隔時間爲10s超時時長爲20s
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.191.199 op monitor interval=10s timeout=20s
每次做好配置之後,應當用verify檢測下有無錯誤
crm(live)configure# verify
配置掛載nfs,且啓動時超時時長爲60s,停止時超時時長爲60s
crm(live)configure# primitive nfsserver ocf:heartbeat:Filesystem params device=192.168.191.111:/shared directory=/var/www/html fstype=nfs op monitor interval=20s timeout=40s op start timeout=60s op stop timeout=60s
crm(live)configure# verify
配置httpd服務,檢測間隔時長10s,超時時長20s
crm(live)configure# primitive webserver lsb:httpd op monitor interval=10s timeout=20s
crm(live)configure# verify
新建一個組webservice包含 webip nfsserver webserver資源,注意順序
crm(live)configure# group webservice webip nfsserver webserver
將資源組webservice對node1的傾向性設置爲100,作爲webservice組資源啓動時候默認啓動的節點
crm(live)configure# location web_on_node1 webservice rule 100: uname eq node1.liaobin.com
設置粘性爲50,目的是讓node1下線後,資源轉移到node2上以後,node1上線後不爭搶資源。
如果node1性能比node2好很多,那麼則可以不設置此項,讓node1拿回資源。
crm(live)configure# property default-resource-stickiness=50
查看定義的資源
crm(live)configure# show
使用cd..返回上一級菜單
crm(live)configure# cd ..
使用status查看狀態,可以看到此時資源運行在node1節點上
crm(live)# status
Last updated: Thu Mar 26 22:34:05 2015
Last change: Thu Mar 26 22:22:51 2015
Stack: classic openais (with plugin)
Current DC: node2.liaobin.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.liaobin.com node2.liaobin.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node1.liaobin.com
nfsserver (ocf::heartbeat:Filesystem): Started node1.liaobin.com
webserver (lsb:httpd): Started node1.liaobin.com
瀏覽器測試訪問:
使用node進入node菜單
crm(live)# node
使用standby命令讓node1進入standby模式
crm(live)node# standby
切換主機到node2:
[root@node2 ~]# crm
crm(live)node# cd ..
crm(live)# status
Last updated: Thu Mar 26 22:37:14 2015
Last change: Thu Mar 26 22:35:46 2015
Stack: classic openais (with plugin)
Current DC: node2.liaobin.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured
Node node1.liaobin.com: standby
Online: [ node2.liaobin.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.liaobin.com
nfsserver (ocf::heartbeat:Filesystem): Started node2.liaobin.com
webserver (lsb:httpd): Started node2.liaobin.com
可以看到此時資源已經切換到node2上了
瀏覽器測試:
瀏覽器訪問成功,說明高可用集羣已經在正常工作了。
切換到主機node1:
使用命令online讓node1上線
crm(live)node# online
crm(live)node# cd ..
crm(live)# status
Last updated: Thu Mar 26 22:39:06 2015
Last change: Thu Mar 26 22:38:58 2015
Stack: classic openais (with plugin)
Current DC: node2.liaobin.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.liaobin.com node2.liaobin.com ]
Resource Group: webservice
webip (ocf::heartbeat:IPaddr): Started node2.liaobin.com
nfsserver (ocf::heartbeat:Filesystem): Started node2.liaobin.com
webserver (lsb:httpd): Started node2.liaobin.com
可以看到此時資源依然在node2上,並沒有切換到傾向性高的node1。