共计 3143 个字符,预计需要花费 8 分钟才能阅读完成。
前言
前面已经部署了:redis 的 4 对主从集群 + 1 对主从 session 主从备份。如果 redis 集群中有宕机情况发生,怎么保障服务的可用性呢,本文准备在 session 服务器上添加启动哨兵服务,测试集群的容灾情况。
1. 添加哨兵
a. 重新整理集群
由于集群的连接操作均使用内网,实际应用同样。修改容器启动命令、配置文件,取消 redis 集群对外公网的端口映射、cli 连接密码。另外节点太多不方便管理,所以减小点。
调整为无密码的 3 对主从集群,删除 clm4、cls4:
/ # redis-cli --cluster del-node 172.1.30.21:6379 c2b42a6c35ab6afb1f360280f9545b3d1761725e
>>> Removing node c2b42a6c35ab6afb1f360280f9545b3d1761725e from cluster 172.1.30.21:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
/ # redis-cli --cluster del-node 172.1.50.21:6379 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47
>>> Removing node 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47 from cluster 172.1.50.21:6379
[ERR] Node 172.1.50.21:6379 is not empty! Reshard data away and try again.
#需要先清空槽数据(rebalance 成 weigth=0)/ # redis-cli --cluster rebalance 172.1.50.21:6379 --cluster-weight 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47=0
Moving 2186 slots from 172.1.50.21:6379 to 172.1.30.11:6379
###
Moving 2185 slots from 172.1.50.21:6379 to 172.1.50.11:6379
###
Moving 2185 slots from 172.1.50.21:6379 to 172.1.50.12:6379
###
/ # redis-cli --cluster del-node 172.1.50.21:6379 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47
>>> Removing node 6d1b7a14a6d0be55a5fcb9266358bd1a42244d47 from cluster 172.1.50.21:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
缩容成功。
b. 启动哨兵
这里修改 rm/rs 的容器启动命令:
docker run --name rm \
--restart=always \
--network=mybridge --ip=172.1.13.11 \
-v /root/tmp/dk/redis/data:/data \
-v /root/tmp/dk/redis/redis.conf:/etc/redis/redis.conf \
-v /root/tmp/dk/redis/sentinel.conf:/etc/redis/sentinel.conf \
-d cffycls/redis5:1.7
docker run --name rs \
--restart=always \
--network=mybridge --ip=172.1.13.12 \
-v /root/tmp/dk/redis_slave/data:/data \
-v /root/tmp/dk/redis_slave/redis.conf:/etc/redis/redis.conf \
-v /root/tmp/dk/redis_slave/sentinel.conf:/etc/redis/sentinel.conf \
-d cffycls/redis5:1.7
参考《redis 集群实现 (六) 容灾与宕机恢复》、《Redis 及其 Sentinel 配置项详细说明》,修改配置文件:
# 若产生数据的存放路径
dir /data/sentinel
#<master-name> <ip> <redis-port> <quorum>
#监视名,ip,port,保证一致性的最小数目
sentinel monitor mymaster1 172.1.50.11 6379 2
sentinel monitor mymaster2 172.1.50.12 6379 2
sentinel monitor mymaster3 172.1.50.13 6379 2
#sentinel down-after-milliseconds <master-name> <milliseconds>
#监视名,认为此节点下线了的超时时间
# Default is 30 seconds.
sentinel down-after-milliseconds mymaster1 30000
sentinel down-after-milliseconds mymaster2 30000
sentinel down-after-milliseconds mymaster3 30000
#sentinel parallel-syncs <master-name> <numslaves>
#监视名,值设为 1 来保证每次只有一个 slave 处于不能处理命令请求的状态
sentinel parallel-syncs mymaster1 1
sentinel parallel-syncs mymaster2 1
sentinel parallel-syncs mymaster3 1
#默认值
# Default is 3 minutes.
sentinel failover-timeout mymaster1 180000
sentinel failover-timeout mymaster2 180000
sentinel failover-timeout mymaster3 180000
创建相应文件夹 (xx/data/sentinel),重启 2 个容器,并进入 rm:
/ # redis-sentinel /etc/redis/sentinel.conf
... ...
14:X 11 Jul 2019 18:25:24.418 # +monitor master mymaster3 172.1.50.13 6379 quorum 2
14:X 11 Jul 2019 18:25:24.419 # +monitor master mymaster1 172.1.50.11 6379 quorum 2
14:X 11 Jul 2019 18:25:24.419 # +monitor master mymaster2 172.1.50.12 6379 quorum 2
14:X 11 Jul 2019 18:25:24.421 * +slave slave 172.1.30.12:6379 172.1.30.12 6379 @ mymaster1 172.1.50.11 6379
14:X 11 Jul 2019 18:25:24.425 * +slave slave 172.1.30.13:6379 172.1.30.13 6379 @ mymaster2 172.1.50.12 6379
14:X 11 Jul 2019 18:26:14.464 # +sdown master mymaster3 172.1.50.13 6379
“不需要监视 slave,监视了 master 的话,slave 会自动加入到 sentinel 里边”,
正文完