本文比較基礎,主要介紹postgresql開源高可用工具repmgr的部署和使用,初學者可以根據本文步驟一步一步做下去,廢話不多說,直接進入主題,本文以兩臺機器爲例。
1.兩臺機器分別編譯安裝postgresql,步驟略。
2.主庫配置
vi postgresql.conf
wal_log_hints=on
archive_mode=on
archive_command=’test ! -f /pgarch/%f && cp %p /pgarch/%f’
創建管理用戶和庫
createuser -s repmgr
createdb repmgr -O repmgr
vi pg_hba.conf
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 192.168.1.1/32 trust
host replication repmgr 192.168.1.2/32 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.1/32 trust
host repmgr repmgr 192.168.1.2/32 trust
備庫連接測試
psql 'host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2'
主庫創建/etc/repmgr.conf
node_id=1
node_name=node1
conninfo='host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/pgdata'
註冊主庫
repmgr -f /etc/repmgr.conf primary register
查看:
repmgr -f /etc/repmgr.conf cluster show
SELECT * FROM repmgr.nodes;
3.備庫配置
創建/etc/repmgr.conf
node_id=2
node_name=node2
conninfo='host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/pgdata'
克隆備庫,內部使用的是pg_basebackup來進行克隆,並且會自動創建recovery.conf文件
repmgr -h 192.168.1.1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone --dry-run
repmgr -h 192.168.1.1 -U repmgr -d repmgr -f /etc/repmgr.conf standby clone
啓動並註冊備庫
repmgr -f /etc/repmgr.conf standby register
查看集羣狀態
repmgr -f /etc/repmgr.conf cluster show
[postgres@node2 pgdata]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 3 | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 3 | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2
查看複製狀態
select * from pg_stat_replication;
select * from pg_stat_wal_receiver;
切換測試,在備庫操作,注意,切換操作需要配置主機間互信。
[postgres@DB2 .ssh]$ repmgr standby switchover
NOTICE: executing switchover on node "node2" (ID: 2)
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT
DETAIL: executing server command "pg_ctl -D '/pgdata' -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 3 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/14000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -w -D '/pgdata' promote"
waiting for server to promote.... done
server promoted
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
INFO: local node 1 can attach to rejoin target node 2
DETAIL: local node's recovery point: 0/14000028; rejoin target node's fork point: 0/14000098
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "pg_ctl -w -D '/pgdata' start"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
NOTICE: switchover was successful
DETAIL: node "node2" is now primary and node "node1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
此時查看狀態,已經切換完成:
[postgres@GCCX4TMP .ssh]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 3 | host=192.168.1.1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 4 | host=192.168.1.2 user=repmgr dbname=repmgr connect_timeout=2
看上面的切換日誌其實可以看到切換的一個流程:
①停止主庫
②備庫promte爲主庫
③原主庫執行rejoin操作:repmgr node rejoin -d ‘host=192.168.1.1 dbname=repmgr user=repmgr’ --force-rewind --config-files=postgresql.conf,postgresql.auto.conf --verbose
這裏說下repmgr node rejoin操作,執行該命令之前先刪除recovery.conf文件,並且要求數據庫之前是乾淨的關閉,達到一個一致性狀態。然後repmgr會檢查數據庫能否加入,如果不能的話就會使用pg_rewind進行恢復操作,至於pg_rewind的原理和用法見我上一篇文章。
後面的文章我們再介紹一下通過repmgrd實現auto failover。
歡迎關注我的公衆號:數據庫架構之美