前言

前期的gitlab 已經開始推廣測試，最近對postgresql 做了主備，這裏說下方案及在實施過程中遇到的坑。
postgresql 的具安裝不在此介紹。

基礎信息

    primary_ip： 192.168.10.2，
    standby_ip： 192.168.10.3，
    PGDATA： /opt/gitlab/postgresql/data，
    postgresql_version：(PostgreSQL) 9.6.8，
    PGCONF_DIR: $PGDATA，

涉及修改的配置文件有：

postgresql.conf --------- postgresql 主配置文件
pg_hba.conf ------------- postgresql 訪問規則文件
recovery.conf ----------- postgresql 備庫訪問主庫配置文件

注意事項！

    1. 主備postgresql 版本需保持一致！
    2. postgresql.conf 配置文件需保持一致！
    3. 備庫提權爲主庫後，切記不要直接啓動原主庫！

準備操作

在primary 192.168.10.2 主機操作

1.爲備庫準備主庫，修改配置文件

cat postgresql.conf

    wal_level = hot_standby         # minimal, replica, or logical
    max_wal_senders = 2     # max number of walsender processes
    hot_standby = on            # "on" allows queries during recovery
    max_connections = 300           # (change requires restart)
    archive_mode = on
    restore_command = ''

cat pg_hba.conf

    host    all             all             127.0.0.1/32            trust
    host    all             all             ::1/128                     trust
    host    replication     gitlab_replicator    192.168.10.3/32    trust

cat recovery.done

    restore_command = ''
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.10.3 port=5432 user=gitlab_replicator'

2.創建用於複製的帳號，並賦予replication 權限

    postgres=#CREATE USER gitlab_replicator REPLICATION LOGIN;

3.基本備份爲備庫準備引導數據

    postgres=#SELECT pg_start_backup(back_20180929);
    cd  /opt/gitlab/postgresql && tar zcf base_data.tar.gz data
    postgres=#SELECT pg_start_stop();

在 standby 192.168.10.3 主機操作

1.解壓基本數據
將主庫上創建的base_data.tar.gz上傳到備庫主機，並解壓到數據目錄
tar zxf base_data.tar.gz -C /opt/gitlab/postgresql/

2.修改配置文件
注： postgresql.conf 文件內此部分一定要與主庫的配置保持一致，否則可能會在主從切換恢復時產生錯誤

cat postgresql.conf

    wal_level = hot_standby         # minimal, replica, or logical
    max_wal_senders = 2     # max number of walsender processes
    hot_standby = on            # "on" allows queries during recovery
    max_connections = 300           # (change requires restart)
    archive_mode = on
    restore_command = ''

cat pg_hba.conf

    host    all             all             127.0.0.1/32            trust
    host    all             all             ::1/128                     trust
    host    replication     gitlab_replicator    192.168.10.2/32    trust

cat recovery.conf

    restore_command = ''
    recovery_target_timeline = 'latest'
    standby_mode = on
    primary_conninfo = 'host=192.168.10.2 port=5432 user=gitlab_replicator'

3.啓動備庫，在主庫執行sql，並在備庫驗證

主從切換

主備庫的判斷是根據當前是否存在recovery.conf文件
在將備庫提升爲主庫時，會自動重命名recovery.conf文件爲recovery.done。同時要將主庫降爲備庫，降備方式爲重命名recovery.done文件
mv recover.done recovery.conf
這樣在處理完主庫故障後，纔會將提升到主庫的更新數據同步過來

這裏提供個簡單的思路及腳本，前提是假設主備之間不存在網絡故障,且不存在同時爲主或備的情況
判斷主庫的狀態
1.爲shut down 時
判斷備庫是否爲in archive recovery並執行將主庫降爲備庫，將備庫升爲主庫，其餘狀態發送報警
2.爲in production時
判斷備庫是否爲in archive recovery，其餘狀態發送報警
3.爲in archive recovery時
判斷備庫是否爲in production，其餘狀態發送報警
4.爲shut down in recovery時
發送報警

shell script

    #!/bin/bash
    PRIMARY_IP="192.168.10.2"
    STANDBY_IP="192.168.10.3"
    PGDATA="/DATA/postgresql/data"
    SYS_USER="root"
    PG_USER="postgresql"
    PGPREFIX="/opt/pgsql"

    pg_status()
    {
            ssh ${SYS_USER}@$1 /
            "su - ${PG_USER} -c '${PGPREFIX}/bin/pg_controldata -D ${PGDATA} /
            | grep cluster' | awk -F : '{print \$2}' | sed 's/^[ \t]*\|[ \t]*$//'"
    }

    # recover to primary
    recovery_primary()
    {
            ssh ${SYS_USER}@$1 /
            "su - ${PG_USER} -c '${PGPREFIX}/bin/pg_ctl promote -D ${PGDATA}'"
    }

    # primary to recovery
    primary_recovery()
    {
            ssh ${SYS_USER}@$1 /
            "su - ${PG_USER} -c 'cd ${PGDATA} && mv recovery.done recovery.conf'"
    }

    send_mail()
    {
            echo "send SNS"
    }

    case "`pg_status ${PRIMARY_IP}`" in
            "shut down")
                    case "`pg_status ${STANDBY_IP}`" in
                            "in archive recovery")
                                    primary_recovery ${PRIMARY_IP}
                                    recovery_primary ${STANDBY_IP}
                                    ;;
                            "shut down in recovery"|"in production")
                                    send_mail
                                    ;;
                    esac
                    ;;
            "in production")
                    case "`pg_status ${STANDBY_IP}`" in
                            "shut down in recovery"|"shut down"|"in production")
                                    send_mail
                                    ;;
                    esac
                    echo "primary"
                    ;;
            "in archive recovery")
                    case "`pg_status ${STANDBY_IP}`" in
                            "shut down")
                                    primary_recovery ${STANDBY_IP}
                                    recovery_primary ${PRIMARY_IP}
                                    ;;
                            "shut down in recovery"|"in archive recovery")
                                    send_mail
                                    ;;
                    esac
                    echo "recovery"
                    ;;
            "shut down in recovery")
                    case "`pg_status ${STANDBY_IP}`" in
                            "shut down in recovery"|"shut down"|"in archive recovery")
                                    send_mail
                                    ;;
                    esac
                    echo "recovery down"
                    ;;
    esac

報錯處理

error 1

FATAL:  no pg_hba.conf entry for replication connection from host "192.168.1.2", user "standby", SSL off

需要將用戶加入到192.168.1.2的pg_hba.conf文件內，並配置好認證方式及口令

error 2

FATAL:  database system identifier differs between the primary and standby
DETAIL:  The primary's identifier is 6589099331306617531, the standby's identifier is 6605061381709180314

這是因爲在將備庫提升爲主庫後，將原先的主庫恢復爲主庫時沒有完全將缺少的數據同步過來導致的

error 3

FATAL:  number of requested standby connections exceeds max_wal_senders (currently 0)

FATAL:  hot standby is not possible because max_connections = 100 is a lower setting than on the master server (its value was 200)

FATAL:  hot standby is not possible because max_locks_per_transaction = 64 is a lower setting than on the master server (its value was 128)

這是因爲備庫的數量超過主庫配置的允許備庫最大連接數量了
這裏配置的爲0
此問提出現在將備庫升爲主庫後，將原主庫降爲備庫同步數據時，因此需要注意這部分的配置主備要一致

後記

postgresql 主主同步需要使用三方中間件實現，有需要的可查詢相關資料

本文參考資料爲postgresql 官方文檔

postgresql 主備及切換-恢復方案

前言

基礎信息

注意事項！

準備操作

在primary 192.168.10.2 主機操作

在 standby 192.168.10.3 主機操作

主從切換

shell script

報錯處理

error 1

error 2

error 3

後記

PDManer [元數建模]-v4.9.0 發佈：一款簡單好用的數據庫建模平臺

使用neovim打造go ide(支持代碼跳轉, 代碼補全, 實時語法檢查)

sql求連續值問題

cs01 CSS Syntax

挑戰程序設計競賽 2.3章習題 poj 3046 Ant Counting

[MASM拾遺]Offset僞指令

h30 HTML Layout Elements

瞭解顯卡

一款基於C#開發的通訊調試工具（支持Modbus RTU、MQTT調試）

Linux/Golang/glibC系統調用

nginx keepalived 配置

rpm包依賴那些坑 ld-linux-x86-64.so.2：bad ELF interpreter

postgresql 主備及切換-恢復方案

ansible install

aws ec2 keepalived 的高可用構建

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結