关于chaos-engineering:混沌工程之ChaosToolkit使用之一删除K8s-POD

40次阅读

共计 10092 个字符,预计需要花费 26 分钟才能阅读完成。

明天咱们来玩一下混沌工程的开源工具 chaostoolkit。

它的指标是提供一个收费,凋谢,社区驱动的工具集以及 api。

官网源码链接:https://github.com/chaostoolk…

要想理解这个工具就必须晓得混沌工程准则中提到的要点。如下所示:

记往这里提到的第一个要点,建设稳态假如。

在运行这个工具之前,咱们先来看一下它的架构。

简略来解释一下,就是 ChaosToolkit 通过 Drivers 来操作你的被测系统。

它的性能点包含如下局部:


上面咱们把工具装起来玩一下。

环境阐明:CentOS7.8、k8s 1.19.5、示例利用

装置 python3
sudo yum install python3 python3-venv
装置 pipenv
gaolou@GaoMacPro ~ % pip3 install pipenv
装置 chaos-toolkit 的 k8s 扩大和报告模块
pip3 install -U chaostoolkit
pip3 install -U chaostoolkit-kubernetes
pip3 install -U chaostoolkit-reporting
如果你须要操作其余平台,也能够装置相应扩大。

创立虚拟环境
python3 -m venv .bundler
source .bundler/bin/activate
为了不影响其余环境,咱们这里用 python 的虚拟环境操作。

以上装置过程是在 k8s 的 master 机器上执行的,如果你不是在 k8s 上装置的,能够配置相应的 k8s 上下文,具体操作请参考:https://chaostoolkit.org/driv…。

chaos discover 摸索试验
首先执行 discover 命令,chaostoolkit 会依据./kube/config 中的内容生成 discovery.json 文件,这个文件中会包含所有能够对 k8s 执行的操作汇合。执行胜利的后果如下:

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos discover chaostoolkit-kubernetes
[2021-06-23 12:18:07 INFO] Attempting to download and install package ‘chaostoolkit-kubernetes’
[2021-06-23 12:18:08 INFO] Package downloaded and installed in current environment
[2021-06-23 12:18:09 INFO] Discovering capabilities from chaostoolkit-kubernetes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.pod.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.pod.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.replicaset.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.statefulset.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.statefulset.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.crd.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.crd.probes
[2021-06-23 12:18:09 INFO] Discovery outcome saved in ./discovery.json
(.bundler) [root@s5 chaostoolkit_scenarios]#
chaos init 生成试验

执行初始化命令,能够依据提醒创立一个混沌试验。

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos init
You are about to create an experiment.
This wizard will walk you through each step so that you can build
the best experiment for your needs.

An experiment is made up of three elements:

  • a steady-state hypothesis [OPTIONAL]
  • an experimental method
  • a set of rollback activities [OPTIONAL]

Only the method is required. Also your experiment will
not run unless you define at least one activity (probe or action)
within it
Experiment’s title: E2 #这里是配置一个试验名

A steady state hypothesis defines what ‘normality’ looks like in your system
The steady state hypothesis is a collection of conditions that are used,
at the beginning of an experiment, to decide if the system is in a recognised
‘normal’ state. The steady state conditions are then used again when your experiment
is complete to detect where your system may have deviated in an interesting,
weakness-detecting way

Initially you may not know what your steady state hypothesis is
and so instead you might create an experiment without one
This is why the stead state hypothesis is optional.
Do you want to define a steady state hypothesis now? [y/N]: y # 创立稳态假说,请留神,这个是混沌工程中的重要概念,然而在其余的大部分混沌工具中都看不到这一步
Hypothesis’s title: H2

You may now define probes that will determine
the steady-state of your system.
Add an activity
1) all_microservices_healthy
2) deployment_is_fully_available
3) deployment_is_not_fully_available
4) microservice_available_and_healthy
5) microservice_is_not_available
6) read_microservices_logs
7) service_endpoint_is_initialized
8) count_pods
9) pod_is_not_available
10) pods_in_conditions
11) pods_in_phase
12) pods_not_in_phase
13) read_pod_logs
14) statefulset_fully_available
15) statefulset_not_fully_available
16) get_cluster_custom_object
17) get_custom_object
18) list_cluster_custom_objects
19) list_custom_objects
Activity (0 to escape): 1 # 抉择稳态假说的判断点,简略来说,这里就是创立一个预期后果

!!!DEPRECATED!!!
1) kill_microservice
2) remove_service_endpoint
Do you want to use this probe? [y/N]: y # 确定是否应用下面抉择的探针

A steady-state probe requires a tolerance value, within which
your system is in a reognised normal state.

What is the tolerance for this probe?: normal

You now need to fill the arguments for this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.
Argument’s value for ‘ns’ [default]: chaosnamespace # 输出 k8s 中要操作的命名空间
Do you want to select another activity? [y/N]: y # 是否抉择一个的操作动作
Add an activity
1) all_microservices_healthy
2) deployment_is_fully_available
3) deployment_is_not_fully_available
1) kill_microservice
4) microservice_available_and_healthy
5) microservice_is_not_available
6) read_microservices_logs
7) service_endpoint_is_initialized
8) count_pods
9) pod_is_not_available
10) pods_in_conditions
11) pods_in_phase
12) pods_not_in_phase
13) read_pod_logs
14) statefulset_fully_available
15) statefulset_not_fully_available
16) get_cluster_custom_object
17) get_custom_object
18) list_cluster_custom_objects
19) list_custom_objects
Activity (0 to escape): 1 # 抉择具体的动作

!!!DEPRECATED!!!
Do you want to use this probe? [y/N]: y # 确定应用下面抉择的动作

You now need to fill the arguments for this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.
Argument’s value for ‘ns’ [default]:
Do you want to select another activity? [y/N]: N # 是否要增加另一个试验动作,这里我不再增加了

An experiment’s method contains actions and probes. Actions
vary real-world events in your system to determine if your
steady-state hypothesis is maintained when those events occur.

An experimental method can also contain probes to gather additional
information about your system as your method is executed.
Do you want to define an experimental method? [y/N]: y # 抉择一个试验具体方法
Add an activity

1) kill_microservice

2) remove_service_endpoint

3) scale_microservice

4) start_microservice

5) all_microservices_healthy

6) deployment_is_fully_available

7) deployment_is_not_fully_available

8) microservice_available_and_healthy

9) microservice_is_not_available

10) read_microservices_logs

11) service_endpoint_is_initialized

12) create_deployment

13) delete_deployment

14) scale_deployment

15) deployment_available_and_healthy

16) deployment_fully_available

17) deployment_not_fully_available

18) cordon_node

19) create_node

20) delete_nodes

21) drain_nodes

22) uncordon_node

23) get_nodes

24) delete_pods

25) exec_in_pods

26) terminate_pods

27) count_pods

28) pod_is_not_available

29) pods_in_conditions

30) pods_in_phase

31) pods_not_in_phase

32) read_pod_logs

33) delete_replica_set

34) create_service_endpoint

35) delete_service

36) service_is_initialized

37) create_statefulset

38) remove_statefulset

39) scale_statefulset

40) statefulset_fully_available

41) statefulset_not_fully_available

42) create_cluster_custom_object

43) create_custom_object

44) delete_cluster_custom_object

45) delete_custom_object

46) patch_cluster_custom_object

47) patch_custom_object

48) replace_cluster_custom_object

49) replace_custom_object

50) get_cluster_custom_object

51) get_custom_object

52) list_cluster_custom_objects

53) list_custom_objects

Activity (0 to escape): 24 # 这里我抉择第 24 个办法:删除一个 POD

!!!DEPRECATED!!!
Do you want to use this action? [y/N]: y # 确认抉择

You now need to fill the arguments for this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.
Argument’s value for ‘name’: DeleteRedisPOD # 给这个办法命名

Argument’s value for ‘ns’ [default]: chaosnamespace # 确定要操作的 k8s 命名空间
Argument’s value for ‘label_selector’ [name in ({name})]: app=redis # 输出要操作对象的标签,以便能够找到操作对象
Do you want to select another activity? [y/N]: N # 是否增加另一个动作,这里我不再增加

An experiment may optionally define a set of remedial actions
that are used to rollback the system to a given state.
Do you want to add some rollbacks now? [y/N]: N # 是否增加回滚动作,这里我是要删除 redis 的 POD,因为 k8s 会主动拉起来,所以我不必回滚动作

Experiment created and saved in ‘./experiment.json’ # 生成了试验文件
(.bundler) [root@s5 chaostoolkit_scenarios]#

Chaos Run 执行案例
(.bundler) [root@s5 chaostoolkit_scenarios]# chaos run experiment.json
[2021-06-28 23:03:23 INFO] Validating the experiment’s syntax
[2021-06-28 23:03:24 INFO] Experiment looks valid
[2021-06-28 23:03:24 INFO] Running experiment: E2
[2021-06-28 23:03:24 INFO] Steady-state strategy: default
[2021-06-28 23:03:24 INFO] Rollbacks strategy: default
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Playing your experiment’s method now…
[2021-06-28 23:03:24 INFO] Action: delete_pods
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Let’s rollback…
[2021-06-28 23:03:24 INFO] No declared rollbacks, let’s move on.
[2021-06-28 23:03:24 INFO] Experiment ended with status: completed
(.bundler) [root@s5 chaostoolkit_scenarios]#
查看后果
执行试验前:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
………………………

redis-master-b96c9795b-nqzmr 1/1 Running 0 3d9h 10.100.220.84 s6 <none> <none>
redis-slave-6b8d456947-6r42k 1/1 Running 0 3d9h 10.100.220.86 s6 <none> <none>
redis-slave-6b8d456947-z55m5 1/1 Running 0 3d9h 10.100.53.206 s7 <none> <none>

执行试验后:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
………………………….

redis-master-b96c9795b-92rc6 0/1 ContainerCreating 0 3s <none> s6 <none> <none>

redis-master-b96c9795b-nqzmr 0/1 Terminating 0 3d9h 10.100.220.84 s6 <none> <none>
redis-slave-6b8d456947-5m2xt 0/1 ContainerCreating 0 2s <none> s6 <none> <none>
redis-slave-6b8d456947-6r42k 1/1 Terminating 0 3d9h 10.100.220.86 s6 <none> <none>
redis-slave-6b8d456947-fj4xc 0/1 ContainerCreating 0 3s <none> s7 <none> <none>
redis-slave-6b8d456947-z55m5 1/1 Terminating 0 3d9h 10.100.53.206 s7 <none> <none>

POD 齐全启动后:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES

…………………..

redis-master-b96c9795b-92rc6 1/1 Running 0 5m43s 10.100.220.89 s6 <none> <none>

redis-slave-6b8d456947-5m2xt 1/1 Running 0 5m42s 10.100.220.90 s6 <none> <none>

redis-slave-6b8d456947-fj4xc 1/1 Running 0 5m43s 10.100.53.211 s7 <none> <none>

[root@s5 ~]#

从下面的后果能够看到,试验是执行胜利的,几个 redisPOD 都被杀掉并被 k8s 拉起来了。

明天咱们就写这一个试验,你能够依据同样的步骤去生成其余试验。

正文完
 0