Corosync+Pacemaker實現web高可用集羣
oreosync在傳遞信息的時候可以通過一個簡單的配置文件來定義信息傳遞的方式和協議等。它是一個新興的軟件,2008年推出,但其實它並不是一個真正意義上的新軟件,在2002年的時候有一個項目Openais , 它由於過大,分裂爲兩個子項目,其中可以實現HA心跳信息傳輸的功能就是Corosync ,它的代碼60%左右來源於Openais. Corosync可以提供一個完整的HA功能,但是要實現更多,更復雜的功能,那就需要使用Openais了。Corosync是未來的發展方向。在以後的新項目裏,一般採用Corosync,而hb_gui可以提供很好的HA管理功能,可以實現圖形化的管理。另外相關的圖形化有RHCS的套件luci+ricci。
節點1,IP地址:172.16.23.11 主機名node1.wl.com 主服務器
節點2,IP地址:172.16.23.12 主機名 node2.wl.com 備用服務器
172.16.23.11 node1.wl.com node1 //別名
172.16.23.12 node2.wl.com node2 //別名
date 0416185012.33 //當前時間 :格式月日時分年秒
hwclock –w //把系統時間同步到硬件
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
# yum -y --nogpgcheck localinstall *.rpm //本地yum安裝,可幫助解決依賴關係
安裝httpd服務 yum install httpd
#vim /corosync.conf 配置文件內容如下
totem {
mcastaddr: 226.194.1.23 //多播方式,ip在224-239網段可隨意設置 mcastport: 5405
}
logging { // 子系統設置
to_logfile: yes
to_syslog: no //日誌有兩個,這裏關閉,方便查找日誌內容
logfile: /var/log/cluster/corosync.log
debug: off //如果想排錯,可臨時開啓
timestamp: on
logger_subsys {
}
group: root
創建日誌存放文件夾mkdir /var/log/cluster
# scp -p corosync.conf authkey node2:/etc/corosync/ //拷貝內容到另一主機
# /etc/init.d/corosync start
service corosync start 5560
查看corosync引擎是否正常啓動: /var/log/cluster/corosync.log
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Successfully read main configuration file
Jun 14 19:02:08 node1 corosync[5103]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1397.
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Corosync Cluster Engine ('1.2.7'): started and ready to provide
Jun 14 19:03:49 node1 corosync[5120]: [MAIN ] Successfully read main configuration file
查看初始化成員節點通知是否正常發出:
# grep TOTEM /var/log/cluster/corosync.log
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transport (UDP/IP).
Jun 14 19:03:49 node1 corosync[5120]: [TOTEM ] Initializing transmit/receive security: libtomcrypt
SOBER128/SHA1HMAC (mode 0).
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] The network interface [192.168.0.5] is now up.
Jun 14 19:03:50 node1 corosync[5120]: [TOTEM ] A processor joined or left the membership and a new membership was
檢查啓動過程中是否有錯誤產生:
# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
查看pacemaker是否正常啓動:
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: CRM: Initialized
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] Logging: Initialized pcmk_startup
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Service: 9
Jun 14 19:03:50 node1 corosync[5120]: [pcmk ] info: pcmk_startup: Local hostname: node1.a.org
如果上面命令執行均沒有問題,接着可以執行如下命令啓動node2上的corosync
# ssh node2 '/etc/init.d/corosync' start
chkconfig httpd off
注意:啓動node2需要在node1上使用如上命令進行,不要在node2節點上直接啓動;
使用如下命令查看集羣節點的啓動狀態:
# crm status
Last updated: Tue Jun 14 19:07:06 2011
Stack: openais
Current DC: node1.a.org - partition with quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
0 Resources configured.
Online: [ node1.a.org node2.a.org ]
從上面的信息可以看出兩個節點都已經正常啓動,並且集羣已經牌正常工作狀態。
# crm_verify -L
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources
have been defined
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the
stonith-enabled option
crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to
ensure data integrity
Errors found during check: config not valid
我們裏可以通過如下命令先禁用stonith:
# crm configure property stonith-enabled=false
使用如下命令查看當前的配置信息:
# crm configure show
node node1.a.org
node node2.a.org
property $id="cib-bootstrap-options" \
cluster-infrastructure="openais" \
從中可以看出stonith已經被禁用。
查看集羣系統所支持的類型:
# crm ra classes
ocf / heartbeat pacemaker
說明:corosync支持heartbeat,LSB和ocf等類型的資源代理,目前較爲常用的類型爲LSB和OCF兩類,stonith類專爲配置stonith設備而用
查看某種類別下資源代理列表,方法如下
#crm ra list lsb| ocf heartbeat| ocf pacemaker | stonith //crm ra list 類別
例子:查看資源代理幫助信息
# crm ra info ocf:heartbeat:IPaddr
# crm configure primitive WebIP ocf:heartbeat:IPaddr params ip=172.16.23.1 //打開網頁時的ip地址
primitive 基本類型資源
通過如下的命令執行結果可以看出此資源已經在node1.a.org上啓動:
# crm status
當然,也可以在node1上執行ifconfig命令看到此地址已經在eth0的別名上生效:
# ifconfig
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
而後我們到node2上通過如下命令停止node1上的corosync服務:
# ssh node1 "/etc/init.d/corosync" stop
查看集羣工作狀態:
# crm status
Last updated: Tue Jun 14 19:37:23 2011
Stack: openais
Current DC: node2.a.org - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
Online: [ node2.a.org ]
OFFLINE: [ node1.a.org ]
上面的信息顯示node1.a.org已經離線,但資源WebIP卻沒能在node2.a.org上啓動。這是因爲此時的集羣狀態爲"WITHOUT quorum",
即已經失去了quorum,此時集羣服務本身已經不滿足正常運行的條件,這對於只有兩節點的集羣來講是不合理的。因此,我們可以通
過如下的命令來修改忽略quorum不能滿足的集羣狀態檢查:
# crm configure property no-quorum-policy=ignore
片刻之後,集羣就會在目前仍在運行中的節點node2上啓動此資源了,如下所示:
# crm status
Last updated: Tue Jun 14 19:43:42 2011
Stack: openais
Current DC: node2.a.org - partition WITHOUT quorum
Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
2 Nodes configured, 2 expected votes
1 Resources configured.
Online: [ node2.a.org ]
OFFLINE: [ node1.a.org ]
好了,驗正完成後,我們正常啓動node1.a.org:
# ssh node1 -- /etc/init.d/corosync start
# crm configure rsc_defaults resource-stickiness=100
# chkconfig httpd off //不能讓開機自動啓動
們這裏使用lsb類型:
新建資源WebSite:
# crm configure primitive WebSite lsb:httpd
查看配置文件中生成的定義:
primitive WebIP ocf:heartbeat:IPaddr \
primitive WebSite lsb:httpd
property $id="cib-bootstrap-options" \
cluster-infrastructure="openais" \
查看資源的啓用狀態:
# crm status Online: [ node1.a.org node2.a.org ]
WebSite (lsb:httpd): Started node2.a.org
因此,對於前述的WebIP和WebSite可能會運行於不同節點的問題,可以通過以下命令來解決:
# crm configure colocation website-with-ip INFINITY: WebSite WebIP
接着,我們還得確保WebSite在某節點啓動之前得先啓動WebIP,這可以使用如下命令實現:
# crm configure order httpd-after-ip mandatory: WebIP WebSite