LINUX集羣學習二——pacemaker+corosync+pcs實驗

實驗目的：使用corosync作爲集羣消息事務層（Massage Layer），pacemaker作爲集羣資源管理器（Cluster Resource Management），pcs作爲CRM的管理接口工具。要求實現httpd的高可用功能。
環境：centos 6.9
Pacemaker: 1.1.15
Corosync:1.47
Pcs: 0.9.155
準備工作：

配置SSH雙機互信;
配置主機名解析/etc/hosts文件；
關閉防火牆：service iptables stop
關閉selunux：setenforce 0
關閉networkmanager: chkconfig NetworkManager off 、service NetworkManager stop
一、軟件安裝
使用yum源可以直接安裝corosync pacemaker以及pcs軟件：
yum install corosync pacemaker pcs -y

二、開啓pcsd服務，兩臺都要開啓
Service pcsd start

[root@node1 ~]# service pcsd start
Starting pcsd: [ OK ]
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# ssh node2 "service pcsd start"
Starting pcsd: [ OK ]
[root@node1 ~]#

三、設置hacluster賬號的密碼，兩臺都要設置
爲hacluster設置一個密碼，用於pcs與pcsd通信
[root@node1 ~]# grep "hacluster" /etc/passwd
hacluster:x:496:493:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
[root@node2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@node2 ~]#

四、配置corosync，生成/etc/cluster/cluster.conf文件，在指定的節點上給pcs與pcsd完成認證。
[root@node1 ~]# pcs cluster auth node1 node2
Username: hacluster //輸入上面設置的帳戶及密碼
Password:
node1: Authorized
Error: Unable to communicate with node2

[root@node1 ~]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# getenforce
Disabled

[root@node1 ~]# pcs cluster auth node1 node2
Username: hacluster
Password:
node1: Authorized
node2: Authorized
tcp 0 0 :::2224 :::* LISTEN 2303/ruby
備註：務必要關閉防火牆，或者配置iptables放行tcp 2224端口。

pcs cluster setup --name mycluster node1 node2 設置集羣相關的參數
只有兩臺服務器pcsd服務、networkmanager關閉等基礎條件準備好後才能夠成功配置集羣的參數，並且生成cluster.conf文件
[root@node1 corosync]# pcs cluster setup --name webcluster node1 node2 --force
Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending cluster config files to the nodes...
node1: Updated cluster.conf...
node2: Updated cluster.conf...

Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success

Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

[root@node1 corosync]# cat /etc/cluster/cluster.conf //生成的配置文件
<cluster config_version="9" name="webcluster">
<fence_daemon/>
<clusternodes>
<clusternode name="node1" nodeid="1">
<fence>
<method name="pcmk-method">
<device name="pcmk-redirect" port="node1"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2">
<fence>
<method name="pcmk-method">
<device name="pcmk-redirect" port="node2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman broadcast="no" expected_votes="1" transport="udp" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_pcmk" name="pcmk-redirect"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>

五、啓動集羣服務
pcs cluster start --all //注意關閉NetworkManager服務
可以使用pcs --debug cluster start --all 打開調試模式，檢查其中的錯誤，提示需要關閉NetworkManager服務

六、查看集羣節點狀態：
Pcs status
Pcs status corosync
Pcs status cluster
[root@node1 corosync]# pcs status
Cluster name: webcluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: cman
Current DC: node1 (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Mon Apr 30 10:06:42 2018 Last change: Mon Apr 30 09:26:32 2018 by root via crmd on node2

2 nodes and 0 resources configured

Online: [ node1 node2 ]

No resources

Daemon Status:
cman: active/disabled
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
[root@node1 corosync]# pcs status corosync
Nodeid Name
1 node1
2 node2

檢查一下是有配置錯誤，可以看到都是和stonith相關
Crm_verify –L –V
使用下面的命令關閉這些錯誤：
pcs property set stonith-enabled=false #關掉這些錯誤

七、配置服務

配置VIP服務
pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.110.150 cidr_netmask=24 op monitor interval=30s
pcs status 查看資源是否啓動，注意這裏測試發現掩碼需要配置爲與網卡同掩碼，否則資源啓動不了
配置httpd服務
這裏有兩種方式，使用ocf:heartbeat:apache或者使用lsb:httpd方式，前者需要手工在兩臺服務器上將httpd服務啓動，而後者服務由pacemaker集羣啓動。
pcs resource create web lsb:httpd op monitor interval=20s
pcs status 可以看到資源已經啓動。

同時，可以在對應的節點上面直接service httpd status查看服務是否啓動，以及ip addr 查看VIP是否獲取到。

八、資源約束配置

配置資源的啓動順序：order，要求vip先啓動，web後啓動。
pcs constraint order vip then web
配置位置約束，希望資源優先在node1節點上運行，設置vip/web對node1節點的優先級爲150，對node2節點的優先級爲50：
pcs constraint location web prefers node1=150
pcs constraint location vip prefers node1=150
pcs constraint location web prefers node2=50
pcs constraint location vip prefers node2=50
[root@node1 ~]# pcs constraint

Location Constraints:
Resource: vip
Enabled on: node1 (score:100)
Enabled on: node2 (score:50)
Resource: web
Enabled on: node1 (score:100)
Enabled on: node2 (score:50)

注意：如果多個資源分佈在不同的設備上，而這些資源又必須共同在同一個設備上才能夠正常的對外提供服務，那麼這個集羣將不能正常工作。
可以看到只有web以及vip對node1的優先級都調整爲150後，集羣才能夠正常對外提供服務，否則會出現兩個資源分佈在不同的設備而導致不能對外提供服務

配置資源組，只有兩者對節點的位置優先級調整爲一樣後，資源組同時切換：
pcs resource group add mygroup vip web
[root@node2 ~]# pcs status groups
mygroup: vip web
[root@node1 ~]# pcs resource
Resource Group: httpgroup
vip (ocf::heartbeat:IPaddr2): Started node1
web (lsb:httpd): Started node1

[root@node1 ~]# crm_simulate -sL

Current cluster status:
Online: [ node1 node2 ]

Resource Group: httpgroup
vip (ocf::heartbeat:IPaddr2): Started node1
web (lsb:httpd): Started node1

Allocation scores:
group_color: httpgroup allocation score on node1: 0
group_color: httpgroup allocation score on node2: 0
group_color: vip allocation score on node1: 100
group_color: vip allocation score on node2: 50
group_color: web allocation score on node1: 100
group_color: web allocation score on node2: 50
native_color: web allocation score on node1: 200
native_color: web allocation score on node2: 100
native_color: vip allocation score on node1: 400
native_color: vip allocation score on node2: 150

也可以將整個資源組作爲整體調整優先級，如下：
pcs constraint location httpgroup prefers node2=100
pcs constraint location httpgroup prefers node1=200

[root@node1 ~]# pcs constraint
Location Constraints:
Resource: httpgroup
Enabled on: node1 (score:200)
Enabled on: node2 (score:100)
Resource: vip
Enabled on: node1 (score:100)
Enabled on: node2 (score:50)
Resource: web
Enabled on: node1 (score:100)
Enabled on: node2 (score:50)
Ordering Constraints:
start vip then start web (kind:Mandatory)

配置排列約束，讓vip與web 資源運行在一起，分數爲100
[root@node1 ~]# pcs constraint colocation add vip with web 100
[root@node1 ~]# pcs constraint show
Location Constraints:
Resource: httpgroup
Enabled on: node1 (score:200)
Enabled on: node2 (score:100)
Resource: vip
Enabled on: node1 (score:100)
Enabled on: node2 (score:50)
Resource: web
Enabled on: node1 (score:100)
Enabled on: node2 (score:50)
Ordering Constraints:
start vip then start web (kind:Mandatory)
Colocation Constraints:
vip with web (score:100)
Ticket Constraints:

九、將資源切換到node2上面
pcs constraint location web prefers node1=100 //將web資源對node1的位置優先級調整爲100，可以看到資源從node2轉換到node1，注意可以調整httpgroup，也可以同時調整web以及vip對node2的優先級。
May 1 09:43:02 node1 crmd[2965]: notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
May 1 09:43:02 node1 pengine[2964]: warning: Processing failed op monitor for web on node2: not running (7)
May 1 09:43:02 node1 pengine[2964]: notice: Move web#011(Started node2 -> node1)
May 1 09:43:02 node1 pengine[2964]: notice: Calculated transition 4, saving inputs in /var/lib/pacemaker/pengine/pe-input-57.bz2
May 1 09:43:02 node1 crmd[2965]: notice: Initiating stop operation web_stop_0 on node2 | action 6
May 1 09:43:02 node1 crmd[2965]: notice: Initiating start operation web_start_0 locally on node1 | action 7
May 1 09:43:03 node1 lrmd[2962]: notice: web_start_0:3682:stderr [ httpd: Could not reliably determine the server's fully qualified domain name, using node1.yang.com for ServerName ]
May 1 09:43:03 node1 crmd[2965]: notice: Result of start operation for web on node1: 0 (ok) | call=12 key=web_start_0 confirmed=true cib-update=42
May 1 09:43:03 node1 crmd[2965]: notice: Initiating monitor operation web_monitor_20000 locally on node1 | action 8
May 1 09:43:03 node1 crmd[2965]: notice: Transition 4 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-57.bz2): Complete
May 1 09:43:03 node1 crmd[2965]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd

十、遺留問題：
節點配置web資源後，始終有一個報錯未能解決，如下：
[root@node1 ~]# pcs status
Cluster name: mycluster
Stack: cman
Current DC: node1 (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Tue May 1 11:13:00 2018 Last change: Tue May 1 11:04:20 2018 by root via cibadmin on node1

2 nodes and 2 resources configured

Online: [ node1 node2 ]

Full list of resources:

Resource Group: httpgroup
vip (ocf::heartbeat:IPaddr2): Started node1
web (lsb:httpd): Started node1

    Failed Actions:                                     //看着像和監控有關，但一直未能弄明白原因
* web_monitor_20000 on node2 'not running' (7): call=11, status=complete, exitreason='none',
    last-rc-change='Tue May  1 09:41:09 2018', queued=0ms, exec=16ms

Daemon Status:
cman: active/disabled
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled

命令：
pcs cluster 設備集羣的pcsd認證、集羣參數、啓動集羣節點、刪除節點等功能。

pcs cluster stop node1 // 關閉集羣中的node1
[root@node2 ~]# pcs status
Cluster name: mycluster
Stack: cman
Current DC: node2 (version 1.1.15-5.el6-e174ec8) - partition with quorum
Last updated: Sat Apr 28 02:23:00 2018 Last change: Sat Apr 28 02:16:12 2018 by root via cibadmin on node2

2 nodes and 2 resources configured

Online: [ node2 ]
OFFLINE: [ node1 ]

Full list of resources:

Resource Group: mygroup
vip (ocf::heartbeat:IPaddr2): Started node2
web (lsb:httpd): Started node2

Daemon Status:
cman: active/disabled
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
此時由於node1上的集羣功能已經關閉，無法在node2上直接開啓，需要在node1上開啓：
[root@node1 ~]# pcs status
Error: cluster is not currently running on this node
[root@node1 ~]#
[root@node1 ~]#
[root@node1 ~]# pcs cluster start node1
node1: Starting Cluster...
pcs resource ：資源相關的命令，包括資源創建、資源刪除、資源使能、描述等。
Pcs constraint：資源約束相關的配置命令
Pcs status ：資源狀態查看相關的命令。

LINUX集羣學習二——pacemaker+corosync+pcs實驗

公司剛入職了一名 Java 中級開發，短短 4 行代碼居然湊齊了 3 個 bug！我哭了~~

公衆號5月C#/.NET熱文一覽

git 下載大陸鏡像地址

搭建DVWA***測試平臺

AR2240 OSPF聚合外部路由後由於無指向NULL0的彙總路由導致環路

linux集羣學習實驗：使用heartbeat實現對httpd的高可用

某銀行二級網絡生產辦公融合改造方案設計

S7700交換機組網部分終端上不了網故障排查

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結