Redis Cluster搭建與維護

1 安裝搭建

1.1 Redis安裝

1.1.1 下載安裝redis

mkdir -p /opt/redis-4.0.9 && cd /opt/redis-4.0.9
wget http://download.redis.io/releases/redis-4.0.9.tar.gz
yum install -y gcc g++ gcc-c++ make
yum -y update
tar -zxvf redis-4.0.9.tar.gz
make

1.1.2 安裝更新ruby

yum -y install ruby ruby-devel rubygems rpm-build
yum install openssl-devel
curl -L get.rvm.io | bash -s stable
source /etc/profile.d/rvm.sh
rvm install 2.4.1
rvm use 2.4.1

1.1.3 安裝redis.rb

gem install redis -v 3.3.3

1.2 Cluster配置

1.2.1 搭建規劃

每臺機安裝一個redis,做兩份配置文件,起兩個進程,搭建一個3主3從的集羣。
10.110.211.191:7000/7001
10.110.211.192:7002/7003
10.110.174.25:7004/7005

ssh root@10.110.211.191
mkdir -p /opt/redis-4.0.9/redis-cluster/7000

1.2.2 集羣配置

創建/opt/redis-4.0.9/redis-cluster/7000/redis.conf文件然後填入如下內容:

#端口7000,7001,7002
port 7000
#默認ip爲127.0.0.1,需要改爲其他節點機器可訪問的ip,否則創建集羣時無法訪問對應的端口,無法創建集羣
bind 10.110.211.191
#redis後臺運行
daemonize yes
#pidfile文件對應7000,7001,7002...
pidfile /var/run/redis_7000.pid
#開啓集羣
cluster-enabled yes
#集羣的配置,配置文件首次啓動自動生成
cluster-config-file nodes_7000.conf
#After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas.
#Similarly after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
cluster-node-timeout 5000
#aof日誌開啓,有需要就開啓,它會每次寫操作都記錄一條日誌
appendonly yes
#log日誌路徑
logfile /var/log/redis/redis_7000.log
#redis備份文件名
dbfilename dump_7000.rdb
#進程根路徑,aof、dump、cluster-config等文件位於這裏
dir /opt/redis-4.0.9/redis-cluster/7000/

其他實例配置參考這個並修改相應端口和路徑即可。

1.2.3 啓動集羣

首先依次啓動各Redis實例:

ssh root@10.110.211.191
/opt/redis-4.0.9/src/redis-server /opt/redis-4.0.9/redis-cluster/7000/redis.conf
...
ssh root@10.110.174.25
/opt/redis-4.0.9/src/redis-server /opt/redis-4.0.9/redis-cluster/7005/redis.conf

然後使用redis-trib.rb創建集羣:

/opt/redis-4.0.9/src/redis-trib.rb create --replicas 1 10.110.211.191:7000 10.110.211.191:7001 10.110.211.192:7002 10.110.211.192:7003 10.110.174.25:7004 10.110.174.25:7005

–replicas 1的意思是每個server有一個備份。

2 集羣使用

2.1 命令工具

2.1.1 集羣信息

  • cluster info :打印集羣的信息
  • cluster nodes :列出集羣當前已知的所有節點( node),以及這些節點的相關信息
  • redis-trib.rb check 192.168.252.101:7000 檢查集羣狀態

2.1.2 節點操作

  • cluster meet :將 ip 和 port 所指定的節點添加到集羣當中,讓它成爲集羣的一份子。
  • cluster forget :從集羣中移除 node_id 指定的節點。
  • cluster replicate :將當前節點設置爲 node_id 指定的節點的從節點。
  • cluster saveconfig :將節點的配置文件保存到硬盤裏面。
  • cluster failover: executed in one of the slaves of the master you want to failover

2.1.3 槽

  • cluster addslots [slot …] :將一個或多個槽( slot)指派( assign)給當前節點。
  • cluster delslots [slot …] :移除一個或多個槽對當前節點的指派。
  • cluster flushslots :移除指派給當前節點的所有槽,讓當前節點變成一個沒有指派任何槽的節點。
  • cluster setslot node :將槽 slot 指派給 node_id 指定的節點,如果槽已經指派給另一個節點,那麼先讓另一個節點刪除該槽>,然後再進行指派。
  • cluster setslot migrating :將本節點的槽 slot 遷移到 node_id 指定的節點中。
  • cluster setslot importing :從 node_id 指定的節點中導入槽 slot 到本節點。
  • cluster setslot stable :取消對槽 slot 的導入( import)或者遷移( migrate)。

2.1.4 鍵

  • cluster keyslot :計算鍵 key 應該被放置在哪個槽上。
  • cluster countkeysinslot :返回槽 slot 目前包含的鍵值對數量。
  • cluster getkeysinslot :返回 count 個 slot 槽中的鍵 。

2.2 日常操作

2.2.1 添加節點

  • add master
    啓動新Redis實例,然後使用redis-trib加入到集羣:
redis-trib.rb add-node 127.0.0.1:7006(new node) 127.0.0.1:7000(any existing node)

然後可以通過reshard命令給新節點分配slot。

redis-trib.rb reshard 10.110.211.191:7000
  • add slave
redis-trib.rb add-node --slave --master-id 0de0233df887e024575f73f57e74a9ddaeed009d 10.110.211.192:7002new node)10.110.211.191:7000(existing node)

也可以redis-cli連接到任意空節點上然後使用replicate命令使某之成爲某個節點的slave:

cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

2.2.2 刪除節點

  • 刪除slave節點
./redis-trib del-node 127.0.0.1:7000 `<node-id>`
  • 刪除master節點
    先reshard該節點的所有slot分給其他節點,再使用del-node命令刪除
    或者使用failover命令使它的一個slave升級爲master再刪除(master節點數沒有減少)
redis-cli -h 10.110.211.191 -p 7000 cluster failover #注意是在其slave節點上執行

2.2.3 節點重啓/升級

  • slave
    直接停掉節點,升級完成後重新啓動
redis-cli -h 10.110.211.191 -p 7001 shutdown
redis-server /opt/redis-4.0.9/redis-cluster/7001/redis.conf
  • master
    使用cluster failover使該master成爲slave,然後升級重啓。
    如需要使其重新成爲master,可以在其新master上再使用一次failover命令切換回來。

3 Troubleshoot

3.1 ruby version太低無法安裝redis

3.2 migrate失敗

  • 錯誤表現

    [ERR] Calling MIGRATE: ERR Syntax error, try CLIENT (LIST | KILL | GETNAME | SETNAME | PAUSE | REPLY)

  • 原因
    redis.rb版本太新,不向後兼容

  • 處理
    降級redis.rb

gem uninstall redis --version 4.0.x
gem install redis -v 3.3.3

參考:https://stackoverflow.com/questions/47774093/redis-cluster-reshard-err-calling-migrate-err-syntax-error

3.3 reshard失敗

  • 錯誤表現

    Check for open slots…
    [WARNING] Node 192.168.44.189:6631 has slots in importing state (12927).
    [WARNING] The following slots are open: 12927

  • 原因
    由於之前命令失敗導致redis集羣中slot狀態有問題。

  • 處理
    嘗試使用redis-trib.rb fix命令,不行就使用setslot命令設置slot狀態爲stable

redis-cli -p 6631 CLUSTER SETSLOT 12927 STABLE

參考:https://github.com/antirez/redis/issues/2776

3.4 add-node失敗

  • 錯誤表現

    [ERR] Node 10.110.211.192:7002 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

  • 原因
    一臺機器上啓動了多個redis進程,aof和dump位置重複或者原先在線的redis節點斷線太久後重新連接上來,redis db不爲空會導致無法加入集羣。

  • 處理
    清除配置文件中指定的nodes_7000.conf文件並清除redis中的數據,然後重新加入節點

redis-cli -h 10.110.211.192 -p 7002 flushdb

3.5 集羣節點狀態爲failed

錯誤表現

redis-cli -h 10.110.211.192 -p 7002 cluster nodes
……
141c68cc373c29fef2b33ee93b64a3425a475275 :0@0 slave,fail,noaddr 1c80515b06f44df77151fcb5dac1c0f3eb499874 1528451188725 1528451188000 13 disconnected
……

  • 原因
    節點重啓後從dump恢復數據,集羣中又未forget本節點,或者其他原因導致數據不一致無法同步。

  • 處理
    在正常的集羣節點中forget掉本節點,flushdb清除節點數據,然後重新add-node

redis-cli -h 10.110.211.191 -p 7000 cluster forget 141c68cc373c29fef2b33ee93b64a3425a475275
redis-cli -h 10.110.211.191 -p 7001 flushdb
redis-trib.rb add-node --slave --master-id 1c80515b06f44df77151fcb5dac1c0f3eb499874 10.110.211.191:7001 10.110.211.191:7000

3.6 創建集羣失敗

  • 錯誤表現
    使用redis-trib.rb create創建集羣時候拋出錯誤:

    ERR Slot 12730 is already busy (Redis::CommandError)

  • 原因
    之前的創建失敗導致slot處於busy狀態

  • 處理
    登陸到對應節點,flushall,cluster reset soft,然後刪除nodes.conf文件,再次執行集羣創建即可。

4 官網介紹摘抄

  • Every Redis Cluster node requires two TCP connections open. The normal Redis TCP port used to serve clients, for example 6379, plus the port obtained by adding 10000 to the data port.
  • Redis Cluster does not support NATted environments, In order to make Docker compatible with Redis Cluster you need to use the host networking mode of Docker.
  • There are 16384(4k * 4) hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384. Every node in a Redis Cluster is responsible for a subset of the hash slots.
  • Redis Cluster supports multiple key operations as long as all the keys involved into a single command execution (or whole transaction, or Lua script execution) all belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a concept called hash tags. if there is a substring between {} brackets in a key, only what is inside the string is hashed.
  • Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
  • Redis Cluster is not able to guarantee strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
  • The first reason why Redis Cluster can lose writes is because it uses asynchronous replication. Redis Cluster has support for synchronous writes when absolutely needed, implemented via the WAIT command, but this usually results into prohibitively low performance. Redis Cluster does not implement strong consistency even when synchronous replication is used: it is always possible under more complex failure scenarios that a slave that was not able to receive the write is elected as master.
  • After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas. Similarly after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
  • Multiple keys operations, transactions, or Lua scripts involving multiple keys are used but only with keys having the same hash tag, which means that the keys used together all have a {…} sub-string that happens to be identical. For example the following multiple keys operation is defined in the context of the same hash tag: SUNION {user:1000}.foo {user:1000}.bar.
  • Redis Cluster configuration parameters:
cluster-enabled <yes/no>

cluster-config-file <filename>

cluster-node-timeout <milliseconds>

cluster-slave-validity-factor <factor>

cluster-migration-barrier <count>

cluster-require-full-coverage <yes/no>
  • A serious client is able to do better than that, and cache the map between hash slots and nodes addresses, to directly use the right connection to the right node.
  • A cluster where every master has a single replica can’t continue operations if the master and its replica fail at the same time, simply because there is no other instance to have a copy of the hash slots the master was serving.
  • The cluster will try to migrate a replica from the master that has the greatest number of replicas in a given moment to an orphaned master.To benefit from replica migration you have just to add a few more replicas to a single master in your cluster, it does not matter what master.
  • Application with multiple keys operations, transactions, or Lua scripts involving multiple keys are used with key names not having an explicit, or the same, hash tag: requires to be modified in order to don’t use multi keys operations or only use them in the context of the same hash tag.

5 參考資料:

CentOs7.3 搭建 Redis-4.0.1 Cluster 集羣服務
Redis Quick Start
Redis cluster tutorial
Redis Cluster Specification
Life in a Redis Cluster: Meet and Gossip with your neighbors

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章