背景

边缘集群(基于 树莓派 + K3S) 须要实现根本的告警性能。

边缘集群限度

  1. CPU/内存/存储 资源缓和,无奈撑持至多须要 2GB 以上内存和大量存储的基于 Prometheus 的残缺监控体系计划(即便是基于 Prometheus Agent, 也无奈撑持) (须要防止额定的存储和计算资源耗费)
  2. 网络条件,无奈撑持监控体系,因为监控体系个别都须要每 1min 定时(或每时每刻)传输数据,且数据量不小;

    1. 存在 5G 免费网络的状况,且拜访的目标端地址须要开明权限,且依照流量免费,且因为 5G 网络条件,网络传输能力受限,且不稳固(可能会在一段时间内离线);

要害需要

总结下来,要害需要如下:

  1. 实现对边缘集群异样的及时告警,须要晓得边缘集群正在产生的异常情况;
  2. 网络:网络条件状况较差,网络流量少,只只能开明极少数目标端地址,能够容忍网络不稳固(一段时间内离线)的状况;
  3. 资源:须要尽量避免额定的存储和计算资源耗费

计划

综上所诉,采纳如下计划实现:

基于 Kubernetes Events 的告警告诉

架构图

技术计划布局

  1. 从 Kubernetes 的各项资源收集 Events, 如:

    1. pod
    2. node
    3. kubelet
    4. crd
    5. ...
  2. 通过 kubernetes-event-exporter 组件来实现对 Kubernetes Events 的收集;
  3. 只筛选 Warning 级别 Events 供告警告诉(后续,条件能够进一步定义)
  4. 告警通过 飞书 webhook 等通信工具进行发送(后续,发送渠道能够减少)

施行步骤

手动形式:

在边缘集群上,执行如下操作:

1. 创立 roles

如下:

cat << _EOF_ | kubectl apply -f ----apiVersion: v1kind: Namespacemetadata:  name: monitoring---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:  name: event-exporter-extrarules:  - apiGroups:      - ""    resources:      - nodes    verbs:      - get      - list      - watch---apiVersion: v1kind: ServiceAccountmetadata:  namespace: monitoring  name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: event-exporterroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: viewsubjects:  - kind: ServiceAccount    namespace: monitoring    name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: event-exporter-extraroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: event-exporter-extrasubjects:  - kind: ServiceAccount    namespace: kube-event-export    name: event-exporter_EOF_

2. 创立 kubernetes-event-exporter config

如下:

cat << _EOF_ | kubectl apply -f -apiVersion: v1kind: ConfigMapmetadata:  name: event-exporter-cfg  namespace: monitoringdata:  config.yaml: |    logLevel: error    logFormat: json    route:      routes:        - match:            - receiver: "dump"              - drop:            - type: "Normal"          match:            - receiver: "feishu"                         receivers:      - name: "dump"        stdout: {}      - name: "feishu"        webhook:          endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."          headers:            Content-Type: application/json          layout:            msg_type: interactive            card:              config:                wide_screen_mode: true                enable_forward: true              header:                title:                  tag: plain_text                  content: XXX IoT K3S 集群告警                template: red              elements:                - tag: div                  text:                     tag: lark_md                    content: "**EventType:**  {{ .Type }}\n**EventKind:**  {{ .InvolvedObject.Kind }}\n**EventReason:**  {{ .Reason }}\n**EventTime:**  {{ .LastTimestamp }}\n**EventMessage:**  {{ .Message }}"      _EOF_

留神:

  • endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." 按需批改为对应的 webhook endpoint, ❌切记勿对外颁布!!!
  • content: XXX IoT K3S 集群告警: 按需调整为不便疾速辨认的名称,如:"家里测试 K3S 集群告警"

3. 创立 Deployment

cat << _EOF_ | kubectl apply -f -apiVersion: apps/v1kind: Deploymentmetadata:  name: event-exporter  namespace: monitoringspec:  replicas: 1  selector:    matchLabels:      app: event-exporter      version: v1  template:    metadata:      labels:        app: event-exporter        version: v1    spec:      volumes:        - name: cfg          configMap:            name: event-exporter-cfg            defaultMode: 420        - name: localtime          hostPath:            path: /etc/localtime            type: ''        - name: zoneinfo          hostPath:            path: /usr/share/zoneinfo            type: ''      containers:        - name: event-exporter          image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11          args:            - '-conf=/data/config.yaml'          env:            - name: TZ              value: Asia/Shanghai          volumeMounts:            - name: cfg              mountPath: /data            - name: localtime              readOnly: true              mountPath: /etc/localtime            - name: zoneinfo              readOnly: true              mountPath: /usr/share/zoneinfo          imagePullPolicy: IfNotPresent      serviceAccount: event-exporter      affinity:        nodeAffinity:          preferredDuringSchedulingIgnoredDuringExecution:            - weight: 100              preference:                matchExpressions:                  - key: node-role.kubernetes.io/controlplane                    operator: In                    values:                      - 'true'            - weight: 100              preference:                matchExpressions:                  - key: node-role.kubernetes.io/control-plane                    operator: In                    values:                      - 'true'            - weight: 100              preference:                matchExpressions:                  - key: node-role.kubernetes.io/master                    operator: In                    values:                      - 'true'          tolerations:        - key: node-role.kubernetes.io/controlplane          value: 'true'          effect: NoSchedule        - key: node-role.kubernetes.io/control-plane          operator: Exists          effect: NoSchedule        - key: node-role.kubernetes.io/master          operator: Exists          effect: NoSchedule      _EOF_

阐明:

  1. event-exporter-cfg 相干配置,是用于加载以 ConfigMap 模式保留的配置文件;
  2. localtime zoneinfo TZ 相干配置,是用于批改该 pod 的时区为Asia/Shanghai, 以使得最终显示的告诉成果为 CST 时区;
  3. affinity tolerations 相干配置,是为了确保:无论如何,优先调度到 master node 下来,按需调整,此处是因为 master 往往在边缘集群中作为网关存在,配置较高,且在线工夫较长;

自动化部署

成果:装置 K3S 时就主动部署

在 K3S server 所在节点,/var/lib/rancher/k3s/server/manifests/ 目录(如果没有该目录就先创立)下,创立 event-exporter.yaml

---apiVersion: v1kind: Namespacemetadata:  name: monitoring---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:  name: event-exporter-extrarules:  - apiGroups:      - ""    resources:      - nodes    verbs:      - get      - list      - watch---apiVersion: v1kind: ServiceAccountmetadata:  namespace: monitoring  name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: event-exporterroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: viewsubjects:  - kind: ServiceAccount    namespace: monitoring    name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: event-exporter-extraroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: event-exporter-extrasubjects:  - kind: ServiceAccount    namespace: kube-event-export    name: event-exporter---apiVersion: v1kind: ConfigMapmetadata:  name: event-exporter-cfg  namespace: monitoringdata:  config.yaml: |    logLevel: error    logFormat: json    route:      routes:        - match:            - receiver: "dump"              - drop:            - type: "Normal"          match:            - receiver: "feishu"                         receivers:      - name: "dump"        stdout: {}      - name: "feishu"        webhook:          endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec"          headers:            Content-Type: application/json          layout:            msg_type: interactive            card:              config:                wide_screen_mode: true                enable_forward: true              header:                title:                  tag: plain_text                  content: xxxK3S集群告警                template: red              elements:                - tag: div                  text:                     tag: lark_md                    content: "**EventType:**  {{ .Type }}\n**EventKind:**  {{ .InvolvedObject.Kind }}\n**EventReason:**  {{ .Reason }}\n**EventTime:**  {{ .LastTimestamp }}\n**EventMessage:**  {{ .Message }}"---apiVersion: apps/v1kind: Deploymentmetadata:  name: event-exporter  namespace: monitoringspec:  replicas: 1  selector:    matchLabels:      app: event-exporter      version: v1  template:    metadata:      labels:        app: event-exporter        version: v1    spec:      volumes:        - name: cfg          configMap:            name: event-exporter-cfg            defaultMode: 420        - name: localtime          hostPath:            path: /etc/localtime            type: ''        - name: zoneinfo          hostPath:            path: /usr/share/zoneinfo            type: ''      containers:        - name: event-exporter          image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11          args:            - '-conf=/data/config.yaml'          env:            - name: TZ              value: Asia/Shanghai          volumeMounts:            - name: cfg              mountPath: /data            - name: localtime              readOnly: true              mountPath: /etc/localtime            - name: zoneinfo              readOnly: true              mountPath: /usr/share/zoneinfo          imagePullPolicy: IfNotPresent      serviceAccount: event-exporter      affinity:        nodeAffinity:          preferredDuringSchedulingIgnoredDuringExecution:            - weight: 100              preference:                matchExpressions:                  - key: node-role.kubernetes.io/controlplane                    operator: In                    values:                      - 'true'            - weight: 100              preference:                matchExpressions:                  - key: node-role.kubernetes.io/control-plane                    operator: In                    values:                      - 'true'            - weight: 100              preference:                matchExpressions:                  - key: node-role.kubernetes.io/master                    operator: In                    values:                      - 'true'          tolerations:        - key: node-role.kubernetes.io/controlplane          value: 'true'          effect: NoSchedule        - key: node-role.kubernetes.io/control-plane          operator: Exists          effect: NoSchedule        - key: node-role.kubernetes.io/master          operator: Exists          effect: NoSchedule  

之后启动 K3S 就会主动部署。

Reference:
主动部署 manifests 和 Helm charts | Rancher 文档

最终成果

如下图:

️参考文档

  • opsgenie/kubernetes-event-exporter: Export Kubernetes events to multiple destinations with routing and filtering (github.com)
  • AliyunContainerService/kube-eventer: kube-eventer emit kubernetes events to sinks (github.com)
  • kubesphere/kube-events: K8s Event Exporting, Filtering and Alerting in Multi-Tenant Environment (github.com)
  • kubesphere/notification-manager: K8s native notification management with multi-tenancy support (github.com)
三人行, 必有我师; 常识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.