背景
边缘集群(基于 树莓派 + K3S) 须要实现根本的告警性能。
边缘集群限度
- CPU/内存/存储 资源缓和,无奈撑持至多须要 2GB 以上内存和大量存储的基于 Prometheus 的残缺监控体系计划(即便是基于 Prometheus Agent, 也无奈撑持) (须要防止额定的存储和计算资源耗费)
网络条件,无奈撑持监控体系,因为监控体系个别都须要每 1min 定时(或每时每刻)传输数据,且数据量不小;
- 存在 5G 免费网络的状况,且拜访的目标端地址须要开明权限,且依照流量免费,且因为 5G 网络条件,网络传输能力受限,且不稳固(可能会在一段时间内离线);
要害需要
总结下来,要害需要如下:
- 实现对边缘集群异样的及时告警,须要晓得边缘集群正在产生的异常情况;
- 网络:网络条件状况较差,网络流量少,只只能开明极少数目标端地址,能够容忍网络不稳固(一段时间内离线)的状况;
- 资源:须要尽量避免额定的存储和计算资源耗费
计划
综上所诉,采纳如下计划实现:
基于 Kubernetes Events 的告警告诉
架构图
技术计划布局
从 Kubernetes 的各项资源收集 Events, 如:
- pod
- node
- kubelet
- crd
- ...
- 通过 kubernetes-event-exporter 组件来实现对 Kubernetes Events 的收集;
- 只筛选
Warning
级别 Events 供告警告诉(后续,条件能够进一步定义) - 告警通过 飞书 webhook 等通信工具进行发送(后续,发送渠道能够减少)
施行步骤
手动形式:
在边缘集群上,执行如下操作:
1. 创立 roles
如下:
cat << _EOF_ | kubectl apply -f ----apiVersion: v1kind: Namespacemetadata: name: monitoring---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: event-exporter-extrarules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch---apiVersion: v1kind: ServiceAccountmetadata: namespace: monitoring name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: event-exporterroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: viewsubjects: - kind: ServiceAccount namespace: monitoring name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: event-exporter-extraroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extrasubjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter_EOF_
2. 创立 kubernetes-event-exporter
config
如下:
cat << _EOF_ | kubectl apply -f -apiVersion: v1kind: ConfigMapmetadata: name: event-exporter-cfg namespace: monitoringdata: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..." headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: XXX IoT K3S 集群告警 template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}" _EOF_
留神:
endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/..."
按需批改为对应的 webhook endpoint, ❌切记勿对外颁布!!!content: XXX IoT K3S 集群告警
: 按需调整为不便疾速辨认的名称,如:"家里测试 K3S 集群告警"
3. 创立 Deployment
cat << _EOF_ | kubectl apply -f -apiVersion: apps/v1kind: Deploymentmetadata: name: event-exporter namespace: monitoringspec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule _EOF_
阐明:
event-exporter-cfg
相干配置,是用于加载以 ConfigMap 模式保留的配置文件;localtime
zoneinfo
TZ
相干配置,是用于批改该 pod 的时区为Asia/Shanghai
, 以使得最终显示的告诉成果为 CST 时区;affinity
tolerations
相干配置,是为了确保:无论如何,优先调度到 master node 下来,按需调整,此处是因为 master 往往在边缘集群中作为网关存在,配置较高,且在线工夫较长;
自动化部署
成果:装置 K3S 时就主动部署
在 K3S server 所在节点,/var/lib/rancher/k3s/server/manifests/
目录(如果没有该目录就先创立)下,创立 event-exporter.yaml
---apiVersion: v1kind: Namespacemetadata: name: monitoring---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: event-exporter-extrarules: - apiGroups: - "" resources: - nodes verbs: - get - list - watch---apiVersion: v1kind: ServiceAccountmetadata: namespace: monitoring name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: event-exporterroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: viewsubjects: - kind: ServiceAccount namespace: monitoring name: event-exporter---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: event-exporter-extraroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: event-exporter-extrasubjects: - kind: ServiceAccount namespace: kube-event-export name: event-exporter---apiVersion: v1kind: ConfigMapmetadata: name: event-exporter-cfg namespace: monitoringdata: config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" - drop: - type: "Normal" match: - receiver: "feishu" receivers: - name: "dump" stdout: {} - name: "feishu" webhook: endpoint: "https://open.feishu.cn/open-apis/bot/v2/hook/dc4fd384-996b-4d20-87cf-45b3518869ec" headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: xxxK3S集群告警 template: red elements: - tag: div text: tag: lark_md content: "**EventType:** {{ .Type }}\n**EventKind:** {{ .InvolvedObject.Kind }}\n**EventReason:** {{ .Reason }}\n**EventTime:** {{ .LastTimestamp }}\n**EventMessage:** {{ .Message }}"---apiVersion: apps/v1kind: Deploymentmetadata: name: event-exporter namespace: monitoringspec: replicas: 1 selector: matchLabels: app: event-exporter version: v1 template: metadata: labels: app: event-exporter version: v1 spec: volumes: - name: cfg configMap: name: event-exporter-cfg defaultMode: 420 - name: localtime hostPath: path: /etc/localtime type: '' - name: zoneinfo hostPath: path: /usr/share/zoneinfo type: '' containers: - name: event-exporter image: ghcr.io/opsgenie/kubernetes-event-exporter:v0.11 args: - '-conf=/data/config.yaml' env: - name: TZ value: Asia/Shanghai volumeMounts: - name: cfg mountPath: /data - name: localtime readOnly: true mountPath: /etc/localtime - name: zoneinfo readOnly: true mountPath: /usr/share/zoneinfo imagePullPolicy: IfNotPresent serviceAccount: event-exporter affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/controlplane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/control-plane operator: In values: - 'true' - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/master operator: In values: - 'true' tolerations: - key: node-role.kubernetes.io/controlplane value: 'true' effect: NoSchedule - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule
之后启动 K3S 就会主动部署。
️Reference:
主动部署 manifests 和 Helm charts | Rancher 文档
最终成果
如下图:
️参考文档
- opsgenie/kubernetes-event-exporter: Export Kubernetes events to multiple destinations with routing and filtering (github.com)
- AliyunContainerService/kube-eventer: kube-eventer emit kubernetes events to sinks (github.com)
- kubesphere/kube-events: K8s Event Exporting, Filtering and Alerting in Multi-Tenant Environment (github.com)
- kubesphere/notification-manager: K8s native notification management with multi-tenancy support (github.com)
三人行, 必有我师; 常识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.