关于物联网:使用-Prometheus-监控-eKuiper-规则运行状态

Prometheus 是一个托管于 CNCF 的开源系统监控和警报工具包，许多公司和组织都采纳了 Prometheus 作为监控告警工具。

eKuiper 的规定是一个继续运行的流式计算工作。规定用于解决无界的数据流，失常状况下，规定启动后会始终运行，一直产生运行状态数据。直到规定被手动进行或呈现不可复原的谬误后进行。eKuiper 中的规定提供了状态 API，可获取规定的运行指标。同时，eKuiper 整合了 Prometheus，可不便地通过后者监控各种状态指标。

本教程面向曾经初步理解 eKuiper 的用户，将介绍规定状态指标以及如何通过 Prometheus 监控特定的指标。

应用 eKuiper 创立规定并运行胜利后，用户能够通过 CLI、REST API 或者治理控制台查看规定的运行状态指标。例如，已有规定 rule1，可通过 curl -X GET "<http://127.0.0.1:9081/rules/rule1/status"> 获取 JSON 格局的规定运行指标，如下所示：

{
  "status": "running",
  "source_demo_0_records_in_total": 265,
  "source_demo_0_records_out_total": 265,
  "source_demo_0_process_latency_us": 0,
  "source_demo_0_buffer_length": 0,
  "source_demo_0_last_invocation": "2022-08-22T17:19:10.979128",
  "source_demo_0_exceptions_total": 0,
  "source_demo_0_last_exception": "","source_demo_0_last_exception_time": 0,"op_2_project_0_records_in_total": 265,"op_2_project_0_records_out_total": 265,"op_2_project_0_process_latency_us": 0,"op_2_project_0_buffer_length": 0,"op_2_project_0_last_invocation":"2022-08-22T17:19:10.979128","op_2_project_0_exceptions_total": 0,"op_2_project_0_last_exception":"",
  "op_2_project_0_last_exception_time": 0,
  "sink_mqtt_0_0_records_in_total": 265,
  "sink_mqtt_0_0_records_out_total": 265,
  "sink_mqtt_0_0_process_latency_us": 0,
  "sink_mqtt_0_0_buffer_length": 0,
  "sink_mqtt_0_0_last_invocation": "2022-08-22T17:19:10.979128",
  "sink_mqtt_0_0_exceptions_total": 0,
  "sink_mqtt_0_0_last_exception": "","sink_mqtt_0_0_last_exception_time": 0
}

运行指标次要包含两个局部，一部分是 status，用于标示规定是否失常运行，其值可能为 running, stopped manually 等。另一部分为规定每个算子的运行指标。规定的算子依据规定的 SQL 生成，每个规定可能会有所不同。在此例中，规定 SQL 为最简略的 SELECT * FROM demo, action 为 MQTT，其生成的算子为 [source_demo, op_project, sink_mqtt] 3 个。每一种算子都有雷同数目的运行指标，与算子名字合起来形成一条指标。例如，算子 source_demo_0 的输出数量 records_in_total 的指标为 source_demo_0_records_in_total。

每个算子的运行指标是雷同的，次要有以下几种：

records_in_total：读入的音讯总量，示意规定启动后处理了多少条音讯。
records_out_total：输入的音讯总量，示意算子正确解决的音讯数量。
process_latency_us：最近一次解决的延时，单位为奥妙。该值为瞬时值，可理解算子的解决性能。整体规定的延时个别由延时最大的算子决定。
buffer_length：算子缓冲区长度。因为算子之间计算速度会有差别，各个算子之间都有缓冲队列。缓冲区长度较大的话阐明算子解决较慢，赶不上上游处理速度。
last_invocation：算子的最初一次运行的工夫。
exceptions_total：异样总量。算子运行中产生的非不可复原的谬误，例如连贯中断，数据格式谬误等均计入异样，而不会中断规定。

在 1.6.1 版本当前，咱们又增加了两个异样相干指标，不便异样的调试解决。

last_exception：最近一次的异样的错误信息。
last_exception_time：最近一次异样的产生工夫。

这些运行指标中的数值类型指标均可应用 Prometheus 进行监控。下一节咱们将形容如何配置 eKuiper 中的 Prometheus 服务。

eKuiper 中自带 Prometheus 服务，然而默认为敞开状态。用户可批改 etc/kuiper.yaml 中的配置关上该服务。其中，prometheus 为布尔值，批改为 true 可关上服务；prometheusPort 配置服务的拜访端口。

  prometheus: true
  prometheusPort: 20499

若应用 Docker 启动 eKuiper，也可通过配置环境变量启用服务。

docker run -p 9081:9081 -d --name ekuiper MQTT_SOURCE__DEFAULT__SERVER="$MQTT_BROKER_ADDRESS" KUIPER__BASIC__PROMETHEUS=true lfedge/ekuiper:$tag

在启动的日志中，能够看到服务启动的相干信息，例如:

time="2022-08-22 17:16:50" level=info msg="Serving prometheus metrics on port <http://localhost:20499/metrics"> file="server/prome_init.go:60"
Serving prometheus metrics on port <http://localhost:20499/metrics>

点击提醒中的地址 http://localhost:20499/metrics，可查看到 Prometheus 中收集到的 eKuiper 的原始指标信息。eKuiper 有规定失常运行之后，能够在页面中搜寻到相似 kuiper_sink_records_in_total 等的指标。用户能够配置 Prometheus 接入 eKuiper，进行更丰盛的展现。

上文咱们曾经实现了将 eKuiper 状态输入为 Prometheus 指标的性能，接下来咱们能够配置 Prometheus 接入这一部分指标，并实现初步的监控。

到 Prometheus 官方网站下载所须要的零碎版本而后解压。

批改配置文件，使其监控 eKuiper。关上 prometheus.yml，批改 scrape_configs 局部，如下所示：

global:
  scrape_interval:     15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: ekuiper
    static_configs:
      - targets: ['localhost:20499']

此处定义了监控工作名为 eKuiper, targets 指向上一节启动的服务的地址。配置实现后，启动 Prometheus。

./prometheus --config.file=prometheus.yml

启动胜利后，关上 http://localhost:9090/ 可进入治理控制台。

监控所有规定的 sink 接管到的音讯数目变动。能够在如图的搜寻框中输出须要监控的指标名称，点击 Execute 即可生成监控表。抉择 Graph 可切换为折线图等展现形式。

点击 Add Panel，通过同样的配置形式，可监控更多的指标。

本文介绍了 eKuiper 中的规定状态指标以及如何应用 Prometheus 简略地监控这些状态指标。用户能够基于此进一步摸索 Prometheus 的更多高级性能，更好地实现 eKuiper 的运维。

版权申明：本文为 EMQ 原创，转载请注明出处。

原文链接：https://www.emqx.com/zh/blog/use-prometheus-to-monitor-ekuiper-rules-status

关于物联网:使用-Prometheus-监控-eKuiper-规则运行状态

规定状态指标

运行指标

配置 eKuiper 的 Prometheus 服务

应用 Prometheus 查看状态

装置和配置

简略监控

总结