关于flink:Flink监控基于PrometheusGrafanaPushgateway构建

Prometheus 作为一个微服务架构监控零碎的解决方案，它和容器也脱不开关系。早在 2006 年 8 月 9 日，Eric Schmidt 在搜索引擎大会上首次提出了云计算（Cloud Computing）的概念，在之后的十几年里，云计算的倒退长驱直入。在 2013 年，Pivotal 的 Matt Stine 又提出了云原生（Cloud Native）的概念，云原生由微服务架构、DevOps 和以容器为代表的麻利基础架构组成，帮忙企业疾速、继续、牢靠、规模化地交付软件。

Prometheus 数据采集形式也非常灵活。要采集指标的监控数据，首先须要在指标处装置数据采集组件，这被称之为 Exporter，它会在指标处收集监控数据，并暴露出一个 HTTP 接口供 Prometheus 查问，Prometheus 通过 Pull 的形式来采集数据，这和传统的 Push 模式不同。不过 Prometheus 也提供了一种形式来反对 Push 模式，你能够将你的数据推送到 Push Gateway，Prometheus 通过 Pull 的形式从 Push Gateway 获取数据。目前的 Exporter 曾经能够采集绝大多数的第三方数据，比方 Docker、HAProxy、StatsD、JMX 等等，官网有一份 Exporter 的列表。

从上图能够看出，Prometheus 生态系统蕴含了几个要害的组件：Prometheus server、Pushgateway、Alertmanager、Web UI 等，然而大多数组件都不是必须的，其中最外围的组件当然是 Prometheus server，它负责收集和存储指标数据，反对表达式查问，和告警的生成。

装置命令

wget https://github.com/prometheus/prometheus/releases/download/v2.7.2/prometheus-2.7.2.linux-amd64.tar.gz
cd prometheus-2.7.2.linux-amd64
./prometheus --version
./prometheus

cat prometheus.yml
  - job_name: 'prometheus'
    static_configs:
  - targets: ['localhost:9090']

  - job_name: 'server'
    static_configs:
  - targets: ['localhost:9100']
killall -HUP prometheus

实际上 Graph 页面是 Prometheus 最弱小的性能，在这里咱们能够应用 Prometheus 提供的一种非凡表达式来查问监控数据，这个表达式被称为 PromQL（Prometheus Query Language）。通过 PromQL 不仅能够在 Graph 页面查问数据，而且还能够通过 Prometheus 提供的 HTTP API 来查问。查问的监控数据有列表和曲线图两种展示模式（对应上图中 Console 和 Graph 这两个标签）。

咱们下面说过，Prometheus 本身也裸露了很多的监控指标，也能够在 Graph 页面查问，开展 Execute 按钮旁边的下拉框，能够看到很多指标名称，咱们轻易选一个，譬如：promhttp_metric_handler_requests_total，这个指标示意 /metrics 页面的拜访次数，Prometheus 就是通过这个页面来抓取本身的监控数据的。在 Console 标签中查问后果如下：

尽管 Prometheus 提供的 Web UI 也能够很好的查看不同指标的视图，然而这个性能非常简单，只适宜用来调试。要实现一个弱小的监控零碎，还须要一个能定制展现不同指标的面板，能反对不同类型的展示形式（曲线图、饼状图、热点图、TopN 等），这就是仪表盘（Dashboard）性能。因而 Prometheus 开发了一套仪表盘零碎 PromDash，不过很快这套零碎就被废除了，官网开始举荐应用 Grafana 来对 Prometheus 的指标数据进行可视化，这不仅是因为 Grafana 的性能十分弱小，而且它和 Prometheus 能够完满的无缝交融。

Grafana 是一个用于可视化大型测量数据的开源零碎，它的性能十分弱小，界面也十分丑陋，应用它能够创立自定义的控制面板，你能够在面板中配置要显示的数据和显示方式，它反对很多不同的数据源，比方：Graphite、InfluxDB、OpenTSDB、Elasticsearch、Prometheus 等，而且它也反对泛滥的插件。

wget https://dl.grafana.com/oss/release/grafana-6.0.0.linux-amd64.tar.gz
./bin/grafana-server web

目前为止，咱们看到的都还只是一些没有理论用处的指标，如果咱们要在咱们的生产环境真正应用 Prometheus，往往须要关注各种各样的指标，譬如服务器的 CPU 负载、内存占用量、IO 开销、入网和出网流量等等。正如下面所说，Prometheus 是应用 Pull 的形式来获取指标数据的，要让 Prometheus 从指标处取得数据，首先必须在指标上装置指标收集的程序，并暴露出 HTTP 接口供 Prometheus 查问，这个指标收集程序被称为 Exporter，不同的指标须要不同的 Exporter 来收集，目前曾经有大量的 Exporter 可供使用，简直囊括了咱们罕用的各种零碎和软件，官网列出了一份罕用 Exporter 的清单，各个 Exporter 都遵循一份端口约定，防止端口抵触，即从 9100 开始顺次递增，这里是残缺的 Exporter 端口列表。另外值得注意的是，有些软件和零碎无需装置 Exporter，这是因为他们自身就提供了裸露 Prometheus 格局的指标数据的性能，比方 Kubernetes、Grafana、Etcd、Ceph 等。

首先咱们来收集服务器的指标，这须要装置 node_exporter，这个 exporter 用于收集 *NIX 内核的零碎，如果你的服务器是 Windows，能够应用 WMI exporter。
和 Prometheus server 一样，node_exporter 也是开箱即用的：

wget https://github.com/prometheus/node_exporter/releases/download/v0.16.0/node_exporter-0.16.0.linux-amd64.tar.gz
tar xvfz node_exporter-0.16.0.linux-amd64.tar.gz
cd node_exporter-0.16.0.linux-amd64
./node_exporter

node_exporter 启动之后，咱们拜访下 /metrics 接口看看是否能失常获取服务器指标：

$ curl http://localhost:9100/metrics

如果所有 OK，咱们能够批改 Prometheus 的配置文件，将服务器加到 scrape_configs 中：

  - job_name: 'mysql'
    static_configs:
      - targets: ['localhost:9104']

批改配置后，须要重启 Prometheus 服务，或者发送 HUP 信号也能够让 Prometheus 从新加载配置：

$ killall -HUP prometheus

在 Prometheus Web UI 的 Status -> Targets 中，能够看到新加的服务器

在 Graph 页面的指标下拉框能够看到很多名称以 node 结尾的指标，譬如咱们输出 node_load1 察看服务器负载

如果想在 Grafana 中查看服务器的指标，能够在 Grafana 的 Dashboards 页面搜寻 node exporter，有很多的面板模板能够间接应用，譬如：Node Exporter Server Metrics 或者 Node Exporter Full 等。

mysqld_exporter 是 Prometheus 官网提供的一个 exporter，咱们首先下载最新版本并解压（开箱即用）：

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz
export DATA_SOURCE_NAME='root:040022Ly.@(localhost:3306)/'
./mysqld_exporter

编辑 flink-conf.yaml，在其中加上 Flink 与 PushGateway 集成的参数。

wget https://github.com/prometheus/pushgateway/releases/download/v0.9.1/pushgateway-0.9.1.linux-amd64.tar.gz
./pushgateway

metrics.reporter.promgateway.class: org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter
# 这里写 PushGateway 的主机名与端口号
metrics.reporter.promgateway.host: localhost
metrics.reporter.promgateway.port: 9091
# Flink metric 在前端展现的标签（前缀）与随机后缀
metrics.reporter.promgateway.jobName: flink-metrics
metrics.reporter.promgateway.randomJobNameSuffix: true
metrics.reporter.promgateway.deleteOnShutdown: false

Prometheus 的配置文件，将服务器加到 scrape_configs 中：

  - job_name: 'pushgateway'
    static_configs:
      - targets: ['localhost:9091']

批改配置后，须要重启 Prometheus 服务，或者发送 HUP 信号也能够让 Prometheus 从新加载配置：

$ killall -HUP prometheus

下载 dashboard 地址: https://github.com/percona/gr…

关于flink:Flink监控基于PrometheusGrafanaPushgateway构建

Prometheus

Prometheus 的整体架构图

装置命令

Grafana

应用 Exporter 收集指标

收集服务器指标

收集 MySQL 指标

收集 Flink 指标