关于高可用:技术分享-orchestrator运维配置集群自动切换测试

作者：姚嵩

地球人，爱好音乐，动漫，电影，游戏，人文，美食，游览，还有其余。尽管都很菜，但毕竟是喜好。

本文起源：原创投稿

* 爱可生开源社区出品，原创内容未经受权不得随便应用，转载请分割小编并注明起源。

https://github.com/openark/or…

⽤ orchestrator 配置 MySQL 集群的⾃动切换。

10.186.65.5:3307
10.186.65.11:3307

 "RecoveryIgnoreHostnameFilters": [], 
 "RecoverMasterClusterFilters": ["*"], 
 "RecoverIntermediateMasterClusterFilters":["*"], 
 "ReplicationLagQuery": "show slave status"
 "ApplyMySQLPromotionAfterMasterFailover": true, 
 "FailMasterPromotionOnLagMinutes": 1,

局部测试场景 (因为 orch 是⾼可⽤架构，所以以下试验命令都是在 raft-leader 节点上执⾏)

敞开 master，确认是否会切换 (提早 < FailMasterPromotionOnLagMinutes)

# 确认已有的集群
  orchestrator-client -c clusters
# 查看集群拓扑，集群为 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307 
# 敞开 master 节点 
  ssh [email protected] "service mysqld_3307 stop" 
# 再次确认已有的集群，原集群会拆分为 2 个集群 
  orchestrator-client -c clusters 
# 查看集群拓扑，此时集群为 10.186.65.5:3307 
  orchestrator-client -c topology -i 10.186.65.5:3307

切换胜利；
新的 master 节点 read_only 和 super_read_only 都敞开了，能够读写；

敞开 master，确认是否会切换 (提早 > FailMasterPromotionOnLagMinutes)

FailMasterPromotionOnLagMinutes 配置的是 1 分钟，也就是 60s

# 查看运⾏态的 FailMasterPromotionOnLagMinutes 参数值
  orchestrator -c dump-config --ignore-raft-setup | jq .FailMasterPromotionOnLagMinutes
# 确认已有的集群
  orchestrator-client -c clusters
# 查看集群拓扑，集群为 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307
# 创立提早 slave(假如此时 Slave 为 10.186.65.5:3307)
  stop slave ;
  change master to master_delay=120;
  start slave ;
  或者
  orchestrator-client -c delay-replication -i 10.186.65.5:3307 -S 120
# 期待 120s
  sleep 120
# 查看集群拓扑，集群为 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307
# 敞开 master 节点 
  ssh [email protected] "service mysqld_3307 stop"
# 再次确认已有的集群
  orchestrator-client -c clusters
# 查看集群拓扑，此时集群依然为 10.186.65.11:3307 
  orchestrator-client -c topology -i 10.186.65.11:3307

未切换；
当备节点提早⼤于 FailMasterPromotionOnLagMinutes 时，不会发⽣切换。

禁⽤全局复原的状况下，敞开 master(提早 < FailMasterPromotionOnLagMinutes)

# 敞开全局复原
  orchestrator-client -c disable-global-recoveries 
  orchestrator-client -c check-global-recoveries 
# 确认已有的集群
  orchestrator-client -c clusters
# 查看集群拓扑，集群为 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307
# 敞开 master 节点 
  ssh [email protected] "service mysqld_3307 stop"
# 再次确认已有的集群
  orchestrator-client -c clusters
# 查看集群拓扑 
  orchestrator-client -c topology -i 10.186.65.11:3307

未切换；
当敞开了全局复原时，不会进⾏切换。

配置了 orchestrator 后，能够配置⾃动切换的 cluster 范畴，参数蕴含不限于：

RecoveryIgnoreHostnameFilters

RecoverMasterClusterFilters

RecoverIntermediateMasterClusterFilters

能够配置收否切换的条件，参数蕴含不限于：

FailMasterPromotionOnLagMinutes

ReplicationLagQuery

当提早超过 FailMasterPromotionOnLagMinutes 分钟时，切换失败，当禁⽤了全局复原时，不会进⾏⾃动切换。

测试场景很多，但测试工夫无限，如有具体场景需要，再具体测试。

如测试后果有出⼊，欢送探讨。

另可能存在各种因素阻⽌切换，不在此⽂章探讨范畴内。

关于高可用:技术分享-orchestrator运维配置集群自动切换测试

参数阐明：

⽬的：

已接管的数据库实例 (1 主 1 从架构)：

orchestrator 的相干参数：

案例 1:

场景：

操作：

论断：

试验截图：

案例 2

场景：

操作：

论断：

试验截图：

案例 3:

场景：

操作：

论断：

试验截图：

总结：

申明：