- GreatSQL社区原创内容未经受权不得随便应用,转载请分割小编并注明起源。
1. 需要背景与万里平安数据库软件GreatDB分布式部署模式介绍
1.1 需要背景
混沌测试是检测分布式系统不确定性、建设零碎弹性信念的一种十分好的形式,因而咱们采纳开源工具Chaos Mesh来做GreatDB分布式集群的混沌测试。
1.2 万里平安数据库软件GreatDB分布式部署模式介绍
万里平安数据库软件GreatDB 是一款关系型数据库软件,同时反对集中式和分布式的部署形式,本文波及的是分布式部署形式。
分布式部署模式采纳shared-nothing架构;通过数据冗余与正本治理确保数据库无单点故障;数据sharding与分布式并行计算实现数据库系统高性能;可无限度动静扩大数据节点,满足业务须要。
整体架构如下图所示:
2. 环境筹备
2.1 Chaos Mesh装置
在装置Chaos Mesh之前请确保曾经事后装置了helm,docker,并筹备好了一个kubernetes环境。
1)在 Helm 仓库中增加 Chaos Mesh 仓库:
helm repo add chaos-mesh https://charts.chaos-mesh.org
2)查看能够装置的 Chaos Mesh 版本:
helm search repo chaos-mesh
3)创立装置 Chaos Mesh 的命名空间:
kubectl create ns chaos-testing
4)在docker环境下装置Chaos Mesh:
helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-testing
验证装置
执行以下命令查看Chaos Mesh的运行状况:
kubectl get pod -n chaos-testing
上面是预期输入:
NAME READY STATUS RESTARTS AGEchaos-controller-manager-d7bc9ccb5-dbccq 1/1 Running 0 26dchaos-daemon-pzxc7 1/1 Running 0 26dchaos-dashboard-5887f7559b-kgz46 1/1 Running 1 26d
如果3个pod的状态都是Running,示意 Chaos Mesh 曾经胜利装置。
2.2 筹备测试须要的镜像
2.2.1 筹备mysql镜像
个别状况下,mysql应用官网5.7版本的镜像,mysql监控采集器应用的是mysqld-exporter,也能够间接从docker hub下载:
docker pull mysql:5.7docker pull prom/mysqld-exporter
2.2.2 筹备zookeeper镜像
zookeeper应用的是官网3.5.5版本镜像,zookeeper组件波及的监控有jmx-prometheus-exporter 和zookeeper-exporter,均从docker hub下载:
docker pull zookeeper:3.5.5docker pull sscaling/jmx-prometheus-exporterdocker pull josdotso/zookeeper-exporter
2.2.3 筹备GreatDB镜像
抉择一个GreatDB的tar包,将其解压失去一个./greatdb目录,再将greatdb-service-docker.sh文件拷贝到这个解压进去的./greatdb目录里:
cp greatdb-service-docker.sh ./greatdb/
将greatdb Dockerfile放到./greatdb文件夹的同级目录下,而后执行以下命令构建GreatDB镜像:
docker build -t greatdb/greatdb:tag2021 .
2.2.4 筹备GreatDB分布式集群部署/清理的镜像
下载集群部署脚本cluster-setup,集群初始化脚本init-zk 以及集群helm charts包(可征询4.0开发/测试组获取)
将上述资料放在同一目录下,编写如下Dockerfile:
FROM debian:buster-slim as init-zkCOPY ./init-zk /root/init-zkRUN chmod +x /root/init-zkFROM debian:buster-slim as cluster-setup\# Set aliyun repo for speedRUN sed -i 's/deb.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.list && \ sed -i 's/security.debian.org/mirrors.aliyun.com/g' /etc/apt/sources.listRUN apt-get -y update && \ apt-get -y install \ curl \ wgetRUN curl -L https://storage.googleapis.com/kubernetes-release/release/v1.20.1/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && \ chmod +x /usr/local/bin/kubectl && \ mkdir /root/.kube && \ wget https://get.helm.sh/helm-v3.5.3-linux-amd64.tar.gz && \ tar -zxvf helm-v3.5.3-linux-amd64.tar.gz && \ mv linux-amd64/helm /usr/local/bin/helmCOPY ./config /root/.kube/COPY ./helm /helmCOPY ./cluster-setup /
执行以下命令构建所需镜像:
docker build --target init-zk -t greatdb/initzk:latest .docker build --target cluster-setup -t greatdb/cluster-setup:v1 .
2.2.5 筹备测试用例的镜像
目前测试反对的用例有:bank,bank2,pbank,tpcc,flashback等,每个用例都是一个可执行文件。
以flashback测例为例构建测试用例的镜像,先将用例下载到本地,在用例的同一目录下编写如下内容的Dockerfile:
FROM debian:buster-slimCOPY ./flashback /RUN cd / && chmod +x ./flashback
执行以下命令构建测试用例镜像:
docker build -t greatdb/testsuite-flashback:v1 .
2.3 将筹备好的镜像上传到公有仓库中
创立公有仓库和上传镜像操作请参考:https://zhuanlan.zhihu.com/p/...
3. Chaos Mesh的应用
3.1 搭建GreatDB分布式集群
在上一章2.2.4 中cluster-setup目录下执行以下命令块去搭建测试集群:
./cluster-setup \-clustername=c0 \-namespace=test \-enable-monitor=true \-mysql-image=mysql:5.7 \-mysql-replica=3 \-mysql-auth=1 \-mysql-normal=1 \-mysql-global=1 \-mysql-partition=1 \-zookeeper-repository=zookeeper \-zookeeper-tag=3.5.5 \-zookeeper-replica=3 \-greatdb-repository=greatdb/greatdb \-greatdb-tag=tag202110 \-greatdb-replica=3 \-greatdb-serviceHost=172.16.70.249
输入信息:
liuxinle@liuxinle-OptiPlex-5060:~/k8s/cluster-setup$ ./cluster-setup \> -clustername=c0 \> -namespace=test \> -enable-monitor=true \> -mysql-image=mysql:5.7 \> -mysql-replica=3 \> -mysql-auth=1 \> -mysql-normal=1 \> -mysql-global=1 \> -mysql-partition=1 \> -zookeeper-repository=zookeeper \> -zookeeper-tag=3.5.5 \> -zookeeper-replica=3 \> -greatdb-repository=greatdb/greatdb \> -greatdb-tag=tag202110 \> -greatdb-replica=3 \> -greatdb-serviceHost=172.16.70.249INFO[2021-10-14T10:41:52+08:00] SetUp the cluster ... NameSpace=testINFO[2021-10-14T10:41:52+08:00] create namespace ... INFO[2021-10-14T10:41:57+08:00] copy helm chart templates ... INFO[2021-10-14T10:41:57+08:00] setup ... Component=MySQLINFO[2021-10-14T10:41:57+08:00] exec helm install and update greatdb-cfg.yaml ... INFO[2021-10-14T10:42:00+08:00] waiting mysql pods running ... INFO[2021-10-14T10:44:27+08:00] setup ... Component=ZookeeperINFO[2021-10-14T10:44:28+08:00] waiting zookeeper pods running ... INFO[2021-10-14T10:46:59+08:00] update greatdb-cfg.yaml INFO[2021-10-14T10:46:59+08:00] setup ... Component=greatdbINFO[2021-10-14T10:47:00+08:00] waiting greatdb pods running ... INFO[2021-10-14T10:47:21+08:00] waiting cluster running ... INFO[2021-10-14T10:47:27+08:00] waiting prometheus server running... INFO[2021-10-14T10:47:27+08:00] Dump Cluster Info INFO[2021-10-14T10:47:27+08:00] SetUp success. ClusterName=c0 NameSpace=test
看到c0-zookeeper-initzk-7hbfs的状态是Completed,其余pod的状态为Running,示意集群搭建胜利。
3.2 在GreatDB分布式集群中应用Chaos Mesh做混沌测试
Chaos Mesh在kubernetes环境反对注入的故障类型包含:模仿Pod故障、模仿网络故障、模仿压力场景等,这里咱们以模仿Pod故障中的pod-kill为例。
将试验配置写入到文件中 pod-kill.yaml,内容示例如下:
apiVersion: chaos-mesh.org/v1alpha1kind: PodChaos # 要注入的故障类型metadata: name: pod-failure-example namespace: test # 测试集群pod所在的namespacespec: action: pod-kill # 要注入的具体故障类型 mode: all # 指定试验的运行形式,all(示意选出所有符合条件的 Pod) duration: '30s' # 指定试验的持续时间 selector: labelSelectors: "app.kubernetes.io/component": "greatdb" # 指定注入故障指标pod的标签,通过kubectl describe pod c0-greatdb-1 -n test 命令返回后果中Labels后的内容失去
创立故障试验,命令如下:
kubectl create -n test -f pod-kill.yaml
创立完故障试验之后,执行命令 kubectl get pod -n test -o wide 后果如下:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATESc0-auth0-mysql-0 2/2 Running 0 14m 10.244.87.18 liuxinle-optiplex-5060 <none> <none>c0-auth0-mysql-1 2/2 Running 0 14m 10.244.87.54 liuxinle-optiplex-5060 <none> <none>c0-auth0-mysql-2 2/2 Running 0 13m 10.244.87.57 liuxinle-optiplex-5060 <none> <none>c0-greatdb-0 0/2 ContainerCreating 0 2s <none> liuxinle-optiplex-5060 <none> <none>c0-greatdb-1 0/2 ContainerCreating 0 2s <none> liuxinle-optiplex-5060 <none> <none>c0-glob0-mysql-0 2/2 Running 0 14m 10.244.87.51 liuxinle-optiplex-5060 <none> <none>c0-glob0-mysql-1 2/2 Running 0 14m 10.244.87.41 liuxinle-optiplex-5060 <none> <none>c0-glob0-mysql-2 2/2 Running 0 13m 10.244.87.60 liuxinle-optiplex-5060 <none> <none>c0-nor0-mysql-0 2/2 Running 0 14m 10.244.87.29 liuxinle-optiplex-5060 <none> <none>c0-nor0-mysql-1 2/2 Running 0 14m 10.244.87.4 liuxinle-optiplex-5060 <none> <none>c0-nor0-mysql-2 2/2 Running 0 13m 10.244.87.25 liuxinle-optiplex-5060 <none> <none>c0-par0-mysql-0 2/2 Running 0 14m 10.244.87.55 liuxinle-optiplex-5060 <none> <none>c0-par0-mysql-1 2/2 Running 0 14m 10.244.87.13 liuxinle-optiplex-5060 <none> <none>c0-par0-mysql-2 2/2 Running 0 13m 10.244.87.21 liuxinle-optiplex-5060 <none> <none>c0-prometheus-server-6697649b76-fkvh9 2/2 Running 0 9m24s 10.244.87.37 liuxinle-optiplex-5060 <none> <none>c0-zookeeper-0 1/1 Running 1 12m 10.244.87.44 liuxinle-optiplex-5060 <none> <none>c0-zookeeper-1 1/1 Running 0 11m 10.244.87.30 liuxinle-optiplex-5060 <none> <none>c0-zookeeper-2 1/1 Running 0 10m 10.244.87.49 liuxinle-optiplex-5060 <none> <none>c0-zookeeper-initzk-7hbfs 0/1 Completed 0 12m 10.244.87.17 liuxinle-optiplex-5060 <none> <none>
4. 在argo中编排测试流程
Argo 是一个开源的容器本地工作流引擎,用于在Kubernetes上实现工作,能够将多步骤工作流建模为一系列工作,实现测试流程编排。
咱们应用argo定义一个测试工作,根本的测试流程是固定的,如下所示:
测试流程的step1是部署测试集群,接着开启两个并行任务,step2跑测试用例,模仿业务场景,step3同时应用Chaos Mesh注入故障,step2的测试用例执行完结之后,step4终止故障注入,最初step5清理集群环境。
4.1 用argo编排一个混沌测试工作流(以flashback测试用例为例)
1)批改 cluster-setup.yaml 中的image信息,改成步骤2.2 筹备测试须要的镜像中本人传上去的集群部署/清理镜像名和tag
2)批改 testsuite-flashback.yaml 中的image信息,改成步骤2.2 筹备测试须要的镜像中本人传上去的测试用例镜像名和tag
3)将集群部署、测试用例和工具模板的yaml文件全副应用 kubectl apply -n argo -f xxx.yaml 命令创立资源 (这些文件定义了一些argo template,不便用户写workflow时候应用)
kubectl apply -n argo -f cluster-setup.yamlkubectl apply -n argo -f testsuite-flashback.yamlkubectl apply -n argo -f tools-template.yaml
4)复制一份workflow模板文件 workflow-template.yaml,将模板文件中正文提醒的局部批改为本人的设置即可,而后执行以下命令创立混沌测试工作流:
kubectl apply -n argo -f workflow-template.yaml
以下是一份workflow模板文件:
apiVersion: argoproj.io/v1alpha1kind: Workflowmetadata: generateName: chaostest-c0-0- name: chaostest-c0-0 namespace: argospec: entrypoint: test-entry #测试入口,在这里传入测试参数,填写clustername、namespace、host、greatdb镜像名和tag名等根本信息 serviceAccountName: argo arguments: parameters: - name: clustername value: c0 - name: namespace value: test - name: host value: 172.16.70.249 - name: port value: 30901 - name: password value: Bgview@2020 - name: user value: root - name: run-time value: 10m - name: greatdb-repository value: greatdb/greatdb - name: greatdb-tag value: tag202110 - name: nemesis value: kill_mysql_normal_master,kill_mysql_normal_slave,kill_mysql_partition_master,kill_mysql_partition_slave,kill_mysql_auth_master,kill_mysql_auth_slave,kill_mysql_global_master,kill_mysql_global_slave,kill_mysql_master,kill_mysql_slave,net_partition_mysql_normal,net_partition_mysql_partition,net_partition_mysql_auth,net_partition_mysql_global - name: mysql-partition value: 1 - name: mysql-global value: 1 - name: mysql-auth value: 1 - name: mysql-normal value: 2 templates: - name: test-entry steps: - - name: setup-greatdb-cluster # step.1 集群部署. 请指定正确的参数,次要是mysql和zookeeper的镜像名、tag名 templateRef: name: cluster-setup-template template: cluster-setup arguments: parameters: - name: namespace value: "{{workflow.parameters.namespace}}" - name: clustername value: "{{workflow.parameters.clustername}}" - name: mysql-image value: mysql:5.7.34 - name: mysql-replica value: 3 - name: mysql-auth value: "{{workflow.parameters.mysql-auth}}" - name: mysql-normal value: "{{workflow.parameters.mysql-normal}}" - name: mysql-partition value: "{{workflow.parameters.mysql-partition}}" - name: mysql-global value: "{{workflow.parameters.mysql-global}}" - name: enable-monitor value: false - name: zookeeper-repository value: zookeeper - name: zookeeper-tag value: 3.5.5 - name: zookeeper-replica value: 3 - name: greatdb-repository value: "{{workflow.parameters.greatdb-repository}}" - name: greatdb-tag value: "{{workflow.parameters.greatdb-tag}}" - name: greatdb-replica value: 3 - name: greatdb-serviceHost value: "{{workflow.parameters.host}}" - name: greatdb-servicePort value: "{{workflow.parameters.port}}" - - name: run-flashbacktest # step.2 运行测试用例,请替换为你要运行的测试用例template并指定正确的参数,次要是测试应用的表个数和大小 templateRef: name: flashback-test-template template: flashback arguments: parameters: - name: user value: "{{workflow.parameters.user}}" - name: password value: "{{workflow.parameters.password}}" - name: host value: "{{workflow.parameters.host}}" - name: port value: "{{workflow.parameters.port}}" - name: concurrency value: 16 - name: size value: 10000 - name: tables value: 10 - name: run-time value: "{{workflow.parameters.run-time}}" - name: single-statement value: true - name: manage-statement value: true - name: invoke-chaos-for-flashabck-test # step.3 注入故障,请指定正确的参数,这里run-time和interval别离定义了故障注入的工夫和频次,因而省略掉了终止故障注入步骤 templateRef: name: chaos-rto-template template: chaos-rto arguments: parameters: - name: user value: "{{workflow.parameters.user}}" - name: host value: "{{workflow.parameters.host}}" - name: password value: "{{workflow.parameters.password}}" - name: port value: "{{workflow.parameters.port}}" - name: k8s-config value: /root/.kube/config - name: namespace value: "{{workflow.parameters.namespace}}" - name: clustername value: "{{workflow.parameters.clustername}}" - name: prometheus value: '' - name: greatdb-job value: greatdb-monitor-greatdb - name: nemesis value: "{{workflow.parameters.nemesis}}" - name: nemesis-duration value: 1m - name: nemesis-mode value: default - name: wait-time value: 5m - name: check-time value: 5m - name: nemesis-scope value: 1 - name: nemesis-log value: true - name: enable-monitor value: false - name: run-time value: "{{workflow.parameters.run-time}}" - name: interval value: 1m - name: monitor-log value: false - name: enable-rto value: false - name: rto-qps value: 0.1 - name: rto-warm value: 5m - name: rto-time value: 1m - name: log-level value: debug - - name: flashbacktest-output # 输入测试用例是否通过的后果 templateRef: name: tools-template template: output-result arguments: parameters: - name: info value: "flashback test pass, with nemesis: {{workflow.parameters.nemesis}}" - - name: clean-greatdb-cluster # step.4 清理测试集群,这里的参数和step.1的参数统一 templateRef: name: cluster-setup-template template: cluster-setup arguments: parameters: - name: namespace value: "{{workflow.parameters.namespace}}" - name: clustername value: "{{workflow.parameters.clustername}}" - name: mysql-image value: mysql:5.7 - name: mysql-replica value: 3 - name: mysql-auth value: "{{workflow.parameters.mysql-auth}}" - name: mysql-normal value: "{{workflow.parameters.mysql-normal}}" - name: mysql-partition value: "{{workflow.parameters.mysql-partition}}" - name: mysql-global value: "{{workflow.parameters.mysql-global}}" - name: enable-monitor value: false - name: zookeeper-repository value: zookeeper - name: zookeeper-tag value: 3.5.5 - name: zookeeper-replica value: 3 - name: greatdb-repository value: "{{workflow.parameters.greatdb-repository}}" - name: greatdb-tag value: "{{workflow.parameters.greatdb-tag}}" - name: greatdb-replica value: 3 - name: greatdb-serviceHost value: "{{workflow.parameters.host}}" - name: greatdb-servicePort value: "{{workflow.parameters.port}}" - name: clean value: true - - name: echo-result templateRef: name: tools-template template: echo arguments: parameters: - name: info value: "{{item}}" withItems: - "{{steps.flashbacktest-output.outputs.parameters.result}}"
Enjoy GreatSQL :)
文章举荐:
GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6...
万答#12,MGR整个集群挂掉后,如何能力主动选主,不必手动干涉
https://mp.weixin.qq.com/s/07...
『2021数据技术嘉年华·ON LINE』:《MySQL高可用架构演进及实际》
https://mp.weixin.qq.com/s/u7...
一条sql语句慢在哪之抓包剖析
https://mp.weixin.qq.com/s/AY...
万答#15,都有哪些状况可能导致MGR服务无奈启动
https://mp.weixin.qq.com/s/in...
技术分享 | 为什么MGR一致性模式不举荐AFTER
https://mp.weixin.qq.com/s/rN...
对于 GreatSQL
GreatSQL是由万里数据库保护的MySQL分支,专一于晋升MGR可靠性及性能,反对InnoDB并行查问个性,是实用于金融级利用的MySQL分支版本。
Gitee:
https://gitee.com/GreatSQL/Gr...
GitHub:
https://github.com/GreatSQL/G...
Bilibili:
https://space.bilibili.com/13...
微信&QQ群:
可搜寻增加GreatSQL社区助手微信好友,发送验证信息“加群”退出GreatSQL/MGR交换微信群
QQ群:533341697
微信小助手:wanlidbc
本文由博客一文多发平台 OpenWrite 公布!