关于kubernetes:彻底搞懂-Kubernetes-中的-Events

2次阅读

共计 9540 个字符,预计需要花费 24 分钟才能阅读完成。

大家好,我是张晋涛。

之前我写了一篇《更优雅的 Kubernetes 集群事件度量计划》,利用 Jaeger 利用 tracing 的形式来采集 Kubernetes 集群中的 events 并进行展现。最终成果如下:

写那篇文章的时候,立了个 flag 要具体介绍下其中的原理,鸽了很久,当初年底了,也该收回来了。

Eents 概览

咱们先来做个简略的示例,来看看 Kubernetes 集群中的 events 是什么。

创立一个新的名叫 moelove 的 namespace,而后在其中创立一个叫做 redis 的 deployment。接下来查看这个 namespace 中的所有 events。

(MoeLove) ➜ kubectl create ns moelove
namespace/moelove created
(MoeLove) ➜ kubectl -n moelove create deployment redis --image=ghcr.io/moelove/redis:alpine 
deployment.apps/redis created
(MoeLove) ➜ kubectl -n moelove get deploy
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
redis   1/1     1            1           11s
(MoeLove) ➜ kubectl -n moelove get events
LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE
21s         Normal   Scheduled           pod/redis-687967dbc5-27vmr    Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
21s         Normal   Pulling             pod/redis-687967dbc5-27vmr    Pulling image "ghcr.io/moelove/redis:alpine"
15s         Normal   Pulled              pod/redis-687967dbc5-27vmr    Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s
14s         Normal   Created             pod/redis-687967dbc5-27vmr    Created container redis
14s         Normal   Started             pod/redis-687967dbc5-27vmr    Started container redis
22s         Normal   SuccessfulCreate    replicaset/redis-687967dbc5   Created pod: redis-687967dbc5-27vmr
22s         Normal   ScalingReplicaSet   deployment/redis              Scaled up replica set redis-687967dbc5 to 1

然而咱们会发现默认状况下 kubectl get events 并没有依照 events 产生的程序进行排列,所以咱们往往须要为其减少 --sort-by='{.metadata.creationTimestamp}' 参数来让其输入能够按工夫进行排列。

这也是为何 Kubernetes v1.23 版本中会新增 kubectl alpha events 命令的起因。我在之前的文章《K8S 生态周报 | Kubernetes v1.23.0 正式公布,新个性一览》中已进行了具体的介绍,这里就不开展了。

按工夫排序后能够看到如下后果:

(MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}'
LAST SEEN   TYPE     REASON              OBJECT                        MESSAGE
2m12s       Normal   Scheduled           pod/redis-687967dbc5-27vmr    Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
2m13s       Normal   SuccessfulCreate    replicaset/redis-687967dbc5   Created pod: redis-687967dbc5-27vmr
2m13s       Normal   ScalingReplicaSet   deployment/redis              Scaled up replica set redis-687967dbc5 to 1
2m12s       Normal   Pulling             pod/redis-687967dbc5-27vmr    Pulling image "ghcr.io/moelove/redis:alpine"
2m6s        Normal   Pulled              pod/redis-687967dbc5-27vmr    Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s
2m5s        Normal   Created             pod/redis-687967dbc5-27vmr    Created container redis
2m5s        Normal   Started             pod/redis-687967dbc5-27vmr    Started container redis

通过以上的操作,咱们能够发现 events 实际上是 Kubernetes 集群中的一种资源。当 Kubernetes 集群中资源状态发生变化时,能够产生新的 events

深刻 Events

单个 Event 对象

既然 events 是 Kubernetes 集群中的一种资源,失常状况下它的 metadata.name 中应该蕴含其名称,用于进行独自操作。所以咱们能够应用如下命令输入其 name:

(MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}' -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
redis-687967dbc5-27vmr.16c4fb7bde8c69d2
redis-687967dbc5.16c4fb7bde6b54c4
redis.16c4fb7bde1bf769
redis-687967dbc5-27vmr.16c4fb7bf8a0ab35
redis-687967dbc5-27vmr.16c4fb7d8ecaeff8
redis-687967dbc5-27vmr.16c4fb7d99709da9
redis-687967dbc5-27vmr.16c4fb7d9be30c06

抉择其中的任意一条 event 记录,将其输入为 YAML 格局进行查看:

(MoeLove) ➜ kubectl -n moelove get events redis-687967dbc5-27vmr.16c4fb7bde8c69d2 -o yaml
action: Binding
apiVersion: v1
eventTime: "2021-12-28T19:31:13.702987Z"
firstTimestamp: null
involvedObject:
  apiVersion: v1
  kind: Pod
  name: redis-687967dbc5-27vmr
  namespace: moelove
  resourceVersion: "330230"
  uid: 71b97182-5593-47b2-88cc-b3f59618c7aa
kind: Event
lastTimestamp: null
message: Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
metadata:
  creationTimestamp: "2021-12-28T19:31:13Z"
  name: redis-687967dbc5-27vmr.16c4fb7bde8c69d2
  namespace: moelove
  resourceVersion: "330235"
  uid: e5c03126-33b9-4559-9585-5e82adcd96b0
reason: Scheduled
reportingComponent: default-scheduler
reportingInstance: default-scheduler-kind-control-plane
source: {}
type: Normal

能够看到其中蕴含了很多信息, 这里咱们先不开展。咱们看另一个例子。

kubectl describe 中的 Events

咱们能够别离对 Deployment 对象和 Pod 对象执行 describe 的操作,能够失去如下后果(省略掉了两头输入):

  • 对 Deployment 操作
(MoeLove) ➜ kubectl -n moelove describe deploy/redis                
Name:                   redis
Namespace:              moelove
...
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  15m   deployment-controller  Scaled up replica set redis-687967dbc5 to 1
  • 对 Pod 操作
(MoeLove) ➜ kubectl -n moelove describe pods redis-687967dbc5-27vmr
Name:         redis-687967dbc5-27vmr                                                                 
Namespace:    moelove
Priority:     0
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  18m   default-scheduler  Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
  Normal  Pulling    18m   kubelet            Pulling image "ghcr.io/moelove/redis:alpine"
  Normal  Pulled     17m   kubelet            Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s
  Normal  Created    17m   kubelet            Created container redis
  Normal  Started    17m   kubelet            Started container redis

咱们能够发现 对不同的资源对象进行 describe 的时候,能看到的 events 内容都是与本人有间接关联的 。在 describe Deployment 的时候,看不到 Pod 相干的 Events。

这阐明,Event 对象中是蕴含它所形容的资源对象的信息的 ,它们是有间接分割的。

联合后面咱们看到的单个 Event 对象,咱们发现 involvedObject 字段中内容就是与该 Event 相关联的资源对象的信息

更进一步理解 Events

咱们来看看如下的示例,创立一个 Deployment,然而应用一个不存在的镜像:

(MoeLove) ➜ kubectl -n moelove create deployment non-exist --image=ghcr.io/moelove/non-exist
deployment.apps/non-exist created
(MoeLove) ➜ kubectl -n moelove get pods
NAME                        READY   STATUS         RESTARTS   AGE
non-exist-d9ddbdd84-tnrhd   0/1     ErrImagePull   0          11s
redis-687967dbc5-27vmr      1/1     Running        0          26m

咱们能够看到以后的 Pod 处于一个 ErrImagePull 的状态。查看以后 namespace 中的 events (我省略掉了之前 deploy/redis 的记录)

(MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}'                                                           
LAST SEEN   TYPE      REASON              OBJECT                           MESSAGE
35s         Normal    SuccessfulCreate    replicaset/non-exist-d9ddbdd84   Created pod: non-exist-d9ddbdd84-tnrhd
35s         Normal    ScalingReplicaSet   deployment/non-exist             Scaled up replica set non-exist-d9ddbdd84 to 1
35s         Normal    Scheduled           pod/non-exist-d9ddbdd84-tnrhd    Successfully assigned moelove/non-exist-d9ddbdd84-tnrhd to kind-worker3
17s         Warning   Failed              pod/non-exist-d9ddbdd84-tnrhd    Error: ErrImagePull
17s         Warning   Failed              pod/non-exist-d9ddbdd84-tnrhd    Failed to pull image "ghcr.io/moelove/non-exist": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/moelove/non-exist:latest": failed to resolve reference "ghcr.io/moelove/non-exist:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
18s         Normal    Pulling             pod/non-exist-d9ddbdd84-tnrhd    Pulling image "ghcr.io/moelove/non-exist"
4s          Warning   Failed              pod/non-exist-d9ddbdd84-tnrhd    Error: ImagePullBackOff
4s          Normal    BackOff             pod/non-exist-d9ddbdd84-tnrhd    Back-off pulling image "ghcr.io/moelove/non-exist"

对这个 Pod 执行 describe 操作:

(MoeLove) ➜ kubectl -n moelove describe pods non-exist-d9ddbdd84-tnrhd
...
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  4m                     default-scheduler  Successfully assigned moelove/non-exist-d9ddbdd84-tnrhd to kind-worker3
  Normal   Pulling    2m22s (x4 over 3m59s)  kubelet            Pulling image "ghcr.io/moelove/non-exist"
  Warning  Failed     2m21s (x4 over 3m59s)  kubelet            Failed to pull image "ghcr.io/moelove/non-exist": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/moelove/non-exist:latest": failed to resolve reference "ghcr.io/moelove/non-exist:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
  Warning  Failed     2m21s (x4 over 3m59s)  kubelet            Error: ErrImagePull
  Warning  Failed     2m9s (x6 over 3m58s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    115s (x7 over 3m58s)   kubelet            Back-off pulling image "ghcr.io/moelove/non-exist"

咱们能够发现,这里的输入和之前正确运行 Pod 的不一样。最次要的区别在与 Age 列。这里咱们看到了相似 115s (x7 over 3m58s) 这样的输入。

它的含意示意: 该类型的 event 在 3m58s 中曾经产生了 7 次,最近的一次产生在 115s 之前

然而当咱们去间接 kubectl get events 的时候,咱们并没有看到有 7 次反复的 event。这阐明 Kubernetes 会主动将反复的 events 进行合并

抉择最初一条 Events (办法后面内容曾经讲了) 并将其内容应用 YAML 格局进行输入:

(MoeLove) ➜ kubectl -n moelove get events non-exist-d9ddbdd84-tnrhd.16c4fce570cfba46 -o yaml
apiVersion: v1
count: 43
eventTime: null
firstTimestamp: "2021-12-28T19:57:06Z"
involvedObject:
  apiVersion: v1
  fieldPath: spec.containers{non-exist}
  kind: Pod
  name: non-exist-d9ddbdd84-tnrhd
  namespace: moelove
  resourceVersion: "333366"
  uid: 33045163-146e-4282-b559-fec19a189a10
kind: Event
lastTimestamp: "2021-12-28T18:07:14Z"
message: Back-off pulling image "ghcr.io/moelove/non-exist"
metadata:
  creationTimestamp: "2021-12-28T19:57:06Z"
  name: non-exist-d9ddbdd84-tnrhd.16c4fce570cfba46
  namespace: moelove
  resourceVersion: "334638"
  uid: 60708be0-23b9-481b-a290-dd208fed6d47
reason: BackOff
reportingComponent: ""reportingInstance:""
source:
  component: kubelet
  host: kind-worker3
type: Normal

这里咱们能够看到其字段中包含一个 count 字段,示意同类 event 产生了多少次。以及 firstTimestamplastTimestamp 别离示意了这个 event 首次呈现了最近一次呈现的工夫。这样也就解释了后面的输入中 events 继续的周期了。

彻底搞懂 Events

以下内容是从 Events 中轻易抉择的一条,咱们能够看到它蕴含的一些字段信息:

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2021-12-28T19:31:13Z"
involvedObject:
  apiVersion: apps/v1
  kind: ReplicaSet
  name: redis-687967dbc5
  namespace: moelove
  resourceVersion: "330227"
  uid: 11e98a9d-9062-4ccb-92cb-f51cc74d4c1d
kind: Event
lastTimestamp: "2021-12-28T19:31:13Z"
message: 'Created pod: redis-687967dbc5-27vmr'
metadata:
  creationTimestamp: "2021-12-28T19:31:13Z"
  name: redis-687967dbc5.16c4fb7bde6b54c4
  namespace: moelove
  resourceVersion: "330231"
  uid: 8e37ec1e-b3a1-420c-96d4-3b3b2995c300
reason: SuccessfulCreate
reportingComponent: ""reportingInstance:""
source:
  component: replicaset-controller
type: Normal

其中次要字段的含意如下:

  • count: 示意以后同类的事件产生了多少次(后面曾经介绍)
  • involvedObject: 与此 event 有间接关联的资源对象(后面曾经介绍), 构造如下:
type ObjectReference struct {
    Kind string
    Namespace string
    Name string
    UID types.UID
    APIVersion string
    ResourceVersion string
    FieldPath string
}
  • source: 间接关联的组件, 构造如下:
type EventSource struct {
    Component string
    Host string
}
  • reason: 简略的总结(或者一个固定的代码),比拟适宜用于做筛选条件,次要是为了让机器可读,以后有超过 50 种这样的代码;
  • message: 给一个更易让人读懂的具体阐明
  • type: 以后只有 NormalWarning 两种类型, 源码中也别离写了其含意:
// staging/src/k8s.io/api/core/v1/types.go
const (
    // Information only and will not cause any problems
    EventTypeNormal string = "Normal"
    // These events are to warn that something might go wrong
    EventTypeWarning string = "Warning"
)

所以,当咱们将这些 Events 都作为 tracing 的 span 采集回来后,就能够依照其 source 进行分类,按 involvedObject 进行关联,按工夫进行排序了。

总结

在这篇文章中,我次要通过两个示例,一个正确部署的 Deploy,以及一个应用不存在镜像部署的 Deploy,深刻的介绍了 Events 对象的理论的作用及其各个字段的含意。

对于 Kubernetes 而言,Events 中蕴含了很多有用的信息,然而这些信息却并不会对 Kubernetes 造成什么影响,它们也并不是理论的 Kubernetes 的日志。默认状况下 Kubernetes 中的日志在 1 小时后就会被清理掉,以便开释对 etcd 的资源占用。

所以为了能更好的让集群管理员晓得产生了什么,在生产环境中,咱们通常会把 Kubernetes 集群的 events 也给采集回来。我集体比拟举荐的工具是:https://github.com/opsgenie/k…

当然你也能够依照我之前的文章《更优雅的 Kubernetes 集群事件度量计划》,利用 Jaeger 利用 tracing 的形式来采集 Kubernetes 集群中的 events 并进行展现。


欢送订阅我的文章公众号【MoeLove】

正文完
 0