大家好,我是张晋涛。

之前我写了一篇《更优雅的 Kubernetes 集群事件度量计划》,利用 Jaeger 利用 tracing 的形式来采集 Kubernetes 集群中的 events 并进行展现。最终成果如下:

写那篇文章的时候,立了个 flag 要具体介绍下其中的原理,鸽了很久,当初年底了,也该收回来了。

Eents 概览

咱们先来做个简略的示例,来看看 Kubernetes 集群中的 events 是什么。

创立一个新的名叫 moelove 的 namespace ,而后在其中创立一个叫做 redis 的 deployment。接下来查看这个 namespace 中的所有 events。

(MoeLove) ➜ kubectl create ns moelovenamespace/moelove created(MoeLove) ➜ kubectl -n moelove create deployment redis --image=ghcr.io/moelove/redis:alpine deployment.apps/redis created(MoeLove) ➜ kubectl -n moelove get deployNAME    READY   UP-TO-DATE   AVAILABLE   AGEredis   1/1     1            1           11s(MoeLove) ➜ kubectl -n moelove get eventsLAST SEEN   TYPE     REASON              OBJECT                        MESSAGE21s         Normal   Scheduled           pod/redis-687967dbc5-27vmr    Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker321s         Normal   Pulling             pod/redis-687967dbc5-27vmr    Pulling image "ghcr.io/moelove/redis:alpine"15s         Normal   Pulled              pod/redis-687967dbc5-27vmr    Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s14s         Normal   Created             pod/redis-687967dbc5-27vmr    Created container redis14s         Normal   Started             pod/redis-687967dbc5-27vmr    Started container redis22s         Normal   SuccessfulCreate    replicaset/redis-687967dbc5   Created pod: redis-687967dbc5-27vmr22s         Normal   ScalingReplicaSet   deployment/redis              Scaled up replica set redis-687967dbc5 to 1

然而咱们会发现默认状况下 kubectl get events 并没有依照 events 产生的程序进行排列,所以咱们往往须要为其减少 --sort-by='{.metadata.creationTimestamp}' 参数来让其输入能够按工夫进行排列。

这也是为何 Kubernetes v1.23 版本中会新增 kubectl alpha events 命令的起因。我在之前的文章《K8S 生态周报| Kubernetes v1.23.0 正式公布,新个性一览》中已进行了具体的介绍,这里就不开展了。

按工夫排序后能够看到如下后果:

(MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}'LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE2m12s       Normal   Scheduled           pod/redis-687967dbc5-27vmr    Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker32m13s       Normal   SuccessfulCreate    replicaset/redis-687967dbc5   Created pod: redis-687967dbc5-27vmr2m13s       Normal   ScalingReplicaSet   deployment/redis              Scaled up replica set redis-687967dbc5 to 12m12s       Normal   Pulling             pod/redis-687967dbc5-27vmr    Pulling image "ghcr.io/moelove/redis:alpine"2m6s        Normal   Pulled              pod/redis-687967dbc5-27vmr    Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s2m5s        Normal   Created             pod/redis-687967dbc5-27vmr    Created container redis2m5s        Normal   Started             pod/redis-687967dbc5-27vmr    Started container redis

通过以上的操作,咱们能够发现 events 实际上是 Kubernetes 集群中的一种资源。当 Kubernetes 集群中资源状态发生变化时,能够产生新的 events

深刻 Events

单个 Event 对象

既然 events 是 Kubernetes 集群中的一种资源,失常状况下它的 metadata.name 中应该蕴含其名称,用于进行独自操作。所以咱们能够应用如下命令输入其 name :

(MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}' -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'redis-687967dbc5-27vmr.16c4fb7bde8c69d2redis-687967dbc5.16c4fb7bde6b54c4redis.16c4fb7bde1bf769redis-687967dbc5-27vmr.16c4fb7bf8a0ab35redis-687967dbc5-27vmr.16c4fb7d8ecaeff8redis-687967dbc5-27vmr.16c4fb7d99709da9redis-687967dbc5-27vmr.16c4fb7d9be30c06

抉择其中的任意一条 event 记录,将其输入为 YAML 格局进行查看:

(MoeLove) ➜ kubectl -n moelove get events redis-687967dbc5-27vmr.16c4fb7bde8c69d2 -o yamlaction: BindingapiVersion: v1eventTime: "2021-12-28T19:31:13.702987Z"firstTimestamp: nullinvolvedObject:  apiVersion: v1  kind: Pod  name: redis-687967dbc5-27vmr  namespace: moelove  resourceVersion: "330230"  uid: 71b97182-5593-47b2-88cc-b3f59618c7aakind: EventlastTimestamp: nullmessage: Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3metadata:  creationTimestamp: "2021-12-28T19:31:13Z"  name: redis-687967dbc5-27vmr.16c4fb7bde8c69d2  namespace: moelove  resourceVersion: "330235"  uid: e5c03126-33b9-4559-9585-5e82adcd96b0reason: ScheduledreportingComponent: default-schedulerreportingInstance: default-scheduler-kind-control-planesource: {}type: Normal

能够看到其中蕴含了很多信息, 这里咱们先不开展。咱们看另一个例子。

kubectl describe 中的 Events

咱们能够别离对 Deployment 对象和 Pod 对象执行 describe 的操作,能够失去如下后果(省略掉了两头输入):

  • 对 Deployment 操作
(MoeLove) ➜ kubectl -n moelove describe deploy/redis                Name:                   redisNamespace:              moelove...Events:  Type    Reason             Age   From                   Message  ----    ------             ----  ----                   -------  Normal  ScalingReplicaSet  15m   deployment-controller  Scaled up replica set redis-687967dbc5 to 1
  • 对 Pod 操作
(MoeLove) ➜ kubectl -n moelove describe pods redis-687967dbc5-27vmrName:         redis-687967dbc5-27vmr                                                                 Namespace:    moelovePriority:     0Events:  Type    Reason     Age   From               Message  ----    ------     ----  ----               -------  Normal  Scheduled  18m   default-scheduler  Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3  Normal  Pulling    18m   kubelet            Pulling image "ghcr.io/moelove/redis:alpine"  Normal  Pulled     17m   kubelet            Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s  Normal  Created    17m   kubelet            Created container redis  Normal  Started    17m   kubelet            Started container redis

咱们能够发现 对不同的资源对象进行 describe 的时候,能看到的 events 内容都是与本人有间接关联的。在 describe Deployment 的时候,看不到 Pod 相干的 Events 。

这阐明, Event 对象中是蕴含它所形容的资源对象的信息的,它们是有间接分割的。

联合后面咱们看到的单个 Event 对象,咱们发现 involvedObject 字段中内容就是与该 Event 相关联的资源对象的信息

更进一步理解 Events

咱们来看看如下的示例,创立一个 Deployment ,然而应用一个不存在的镜像:

(MoeLove) ➜ kubectl -n moelove create deployment non-exist --image=ghcr.io/moelove/non-existdeployment.apps/non-exist created(MoeLove) ➜ kubectl -n moelove get podsNAME                        READY   STATUS         RESTARTS   AGEnon-exist-d9ddbdd84-tnrhd   0/1     ErrImagePull   0          11sredis-687967dbc5-27vmr      1/1     Running        0          26m

咱们能够看到以后的 Pod 处于一个 ErrImagePull 的状态。查看以后 namespace 中的 events (我省略掉了之前 deploy/redis 的记录)

(MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}'                                                           LAST SEEN   TYPE      REASON              OBJECT                           MESSAGE35s         Normal    SuccessfulCreate    replicaset/non-exist-d9ddbdd84   Created pod: non-exist-d9ddbdd84-tnrhd35s         Normal    ScalingReplicaSet   deployment/non-exist             Scaled up replica set non-exist-d9ddbdd84 to 135s         Normal    Scheduled           pod/non-exist-d9ddbdd84-tnrhd    Successfully assigned moelove/non-exist-d9ddbdd84-tnrhd to kind-worker317s         Warning   Failed              pod/non-exist-d9ddbdd84-tnrhd    Error: ErrImagePull17s         Warning   Failed              pod/non-exist-d9ddbdd84-tnrhd    Failed to pull image "ghcr.io/moelove/non-exist": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/moelove/non-exist:latest": failed to resolve reference "ghcr.io/moelove/non-exist:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden18s         Normal    Pulling             pod/non-exist-d9ddbdd84-tnrhd    Pulling image "ghcr.io/moelove/non-exist"4s          Warning   Failed              pod/non-exist-d9ddbdd84-tnrhd    Error: ImagePullBackOff4s          Normal    BackOff             pod/non-exist-d9ddbdd84-tnrhd    Back-off pulling image "ghcr.io/moelove/non-exist"

对这个 Pod 执行 describe 操作:

(MoeLove) ➜ kubectl -n moelove describe pods non-exist-d9ddbdd84-tnrhd...Events:  Type     Reason     Age                    From               Message  ----     ------     ----                   ----               -------  Normal   Scheduled  4m                     default-scheduler  Successfully assigned moelove/non-exist-d9ddbdd84-tnrhd to kind-worker3  Normal   Pulling    2m22s (x4 over 3m59s)  kubelet            Pulling image "ghcr.io/moelove/non-exist"  Warning  Failed     2m21s (x4 over 3m59s)  kubelet            Failed to pull image "ghcr.io/moelove/non-exist": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/moelove/non-exist:latest": failed to resolve reference "ghcr.io/moelove/non-exist:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden  Warning  Failed     2m21s (x4 over 3m59s)  kubelet            Error: ErrImagePull  Warning  Failed     2m9s (x6 over 3m58s)   kubelet            Error: ImagePullBackOff  Normal   BackOff    115s (x7 over 3m58s)   kubelet            Back-off pulling image "ghcr.io/moelove/non-exist"

咱们能够发现,这里的输入和之前正确运行 Pod 的不一样。最次要的区别在与 Age 列。这里咱们看到了相似 115s (x7 over 3m58s) 这样的输入。

它的含意示意: 该类型的 event 在 3m58s 中曾经产生了 7 次,最近的一次产生在 115s 之前

然而当咱们去间接 kubectl get events 的时候,咱们并没有看到有 7 次反复的 event 。这阐明 Kubernetes 会主动将反复的 events 进行合并

抉择最初一条 Events (办法后面内容曾经讲了) 并将其内容应用 YAML 格局进行输入:

(MoeLove) ➜ kubectl -n moelove get events non-exist-d9ddbdd84-tnrhd.16c4fce570cfba46 -o yamlapiVersion: v1count: 43eventTime: nullfirstTimestamp: "2021-12-28T19:57:06Z"involvedObject:  apiVersion: v1  fieldPath: spec.containers{non-exist}  kind: Pod  name: non-exist-d9ddbdd84-tnrhd  namespace: moelove  resourceVersion: "333366"  uid: 33045163-146e-4282-b559-fec19a189a10kind: EventlastTimestamp: "2021-12-28T18:07:14Z"message: Back-off pulling image "ghcr.io/moelove/non-exist"metadata:  creationTimestamp: "2021-12-28T19:57:06Z"  name: non-exist-d9ddbdd84-tnrhd.16c4fce570cfba46  namespace: moelove  resourceVersion: "334638"  uid: 60708be0-23b9-481b-a290-dd208fed6d47reason: BackOffreportingComponent: ""reportingInstance: ""source:  component: kubelet  host: kind-worker3type: Normal

这里咱们能够看到其字段中包含一个 count 字段,示意同类 event 产生了多少次。以及 firstTimestamplastTimestamp 别离示意了这个 event 首次呈现了最近一次呈现的工夫。这样也就解释了后面的输入中 events 继续的周期了。

彻底搞懂 Events

以下内容是从 Events 中轻易抉择的一条,咱们能够看到它蕴含的一些字段信息:

apiVersion: v1count: 1eventTime: nullfirstTimestamp: "2021-12-28T19:31:13Z"involvedObject:  apiVersion: apps/v1  kind: ReplicaSet  name: redis-687967dbc5  namespace: moelove  resourceVersion: "330227"  uid: 11e98a9d-9062-4ccb-92cb-f51cc74d4c1dkind: EventlastTimestamp: "2021-12-28T19:31:13Z"message: 'Created pod: redis-687967dbc5-27vmr'metadata:  creationTimestamp: "2021-12-28T19:31:13Z"  name: redis-687967dbc5.16c4fb7bde6b54c4  namespace: moelove  resourceVersion: "330231"  uid: 8e37ec1e-b3a1-420c-96d4-3b3b2995c300reason: SuccessfulCreatereportingComponent: ""reportingInstance: ""source:  component: replicaset-controllertype: Normal

其中次要字段的含意如下:

  • count: 示意以后同类的事件产生了多少次 (后面曾经介绍)
  • involvedObject: 与此 event 有间接关联的资源对象 (后面曾经介绍) , 构造如下:
type ObjectReference struct {    Kind string    Namespace string    Name string    UID types.UID    APIVersion string    ResourceVersion string    FieldPath string}
  • source: 间接关联的组件, 构造如下:
type EventSource struct {    Component string    Host string}
  • reason: 简略的总结(或者一个固定的代码),比拟适宜用于做筛选条件,次要是为了让机器可读,以后有超过 50 种这样的代码;
  • message: 给一个更易让人读懂的具体阐明
  • type: 以后只有 NormalWarning 两种类型, 源码中也别离写了其含意:
// staging/src/k8s.io/api/core/v1/types.goconst (    // Information only and will not cause any problems    EventTypeNormal string = "Normal"    // These events are to warn that something might go wrong    EventTypeWarning string = "Warning")

所以,当咱们将这些 Events 都作为 tracing 的 span 采集回来后,就能够依照其 source 进行分类,按 involvedObject 进行关联,按工夫进行排序了。

总结

在这篇文章中,我次要通过两个示例,一个正确部署的 Deploy,以及一个应用不存在镜像部署的 Deploy,深刻的介绍了 Events 对象的理论的作用及其各个字段的含意。

对于 Kubernetes 而言,Events 中蕴含了很多有用的信息,然而这些信息却并不会对 Kubernetes 造成什么影响,它们也并不是理论的 Kubernetes 的日志。默认状况下 Kubernetes 中的日志在 1 小时后就会被清理掉,以便开释对 etcd 的资源占用。

所以为了能更好的让集群管理员晓得产生了什么,在生产环境中,咱们通常会把 Kubernetes 集群的 events 也给采集回来。我集体比拟举荐的工具是: https://github.com/opsgenie/k...

当然你也能够依照我之前的文章 《更优雅的 Kubernetes 集群事件度量计划》,利用 Jaeger 利用 tracing 的形式来采集 Kubernetes 集群中的 events 并进行展现。


欢送订阅我的文章公众号【MoeLove】