容器监控实践—node-exporter

jiezi

6 年前

概述
Prometheus 从 2016 年加入 CNCF，到 2018 年 8 月毕业，现在已经成为 Kubernetes 的官方监控方案，接下来的几篇文章将详细解读 Promethues(2.x)
Prometheus 可以从 Kubernetes 集群的各个组件中采集数据，比如 kubelet 中自带的 cadvisor，api-server 等，而 node-export 就是其中一种来源
Exporter 是 Prometheus 的一类数据采集组件的总称。它负责从目标处搜集数据，并将其转化为 Prometheus 支持的格式。与传统的数据采集组件不同的是，它并不向中央服务器发送数据，而是等待中央服务器主动前来抓取，默认的抓取地址为 http://CURRENT_IP:9100/metrics
node-exporter 用于采集服务器层面的运行指标，包括机器的 loadavg、filesystem、meminfo 等基础监控，类似于传统主机监控维度的 zabbix-agent
node-export 由 prometheus 官方提供、维护，不会捆绑安装，但基本上是必备的 exporter
功能
node-exporter 用于提供 *NIX 内核的硬件以及系统指标。

如果是 windows 系统，可以使用 WMI exporter

如果是采集 NVIDIA 的 GPU 指标，可以使用 prometheus-dcgm

根据不同的 *NIX 操作系统，node-exporter 采集指标的支持也是不一样的，如：

diskstats 支持 Darwin, Linux
cpu 支持 Darwin, Dragonfly, FreeBSD, Linux, Solaris 等，

详细信息参考：node_exporter
我们可以使用 –collectors.enabled 参数指定 node_exporter 收集的功能模块, 或者用 –no-collector 指定不需要的模块，如果不指定，将使用默认配置。
部署
二进制部署：

下载地址：从 https://github.com/prometheus…

解压文件：tar -xvzf **.tar.gz
开始运行：./node_exporter

./node_exporter -h 查看帮助
usage: node_exporter [<flags>]

Flags:
-h, –help
–collector.diskstats.ignored-devices
–collector.filesystem.ignored-mount-points
–collector.filesystem.ignored-fs-types
–collector.netdev.ignored-devices
–collector.netstat.fields
–collector.ntp.server=”127.0.0.1″
…..
./node_exporter 运行后，可以访问 http://${IP}:9100/metrics，就会展示对应的指标列表
Docker 安装：
docker run -d \
–net=”host” \
–pid=”host” \
-v “/:/host:ro,rslave” \
quay.io/prometheus/node-exporter \
–path.rootfs /host
k8s 中安装：
node-exporter.yaml 文件：
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: ‘true’
labels:
app: node-exporter
name: node-exporter
name: node-exporter
spec:
clusterIP: None
ports:
– name: scrape
port: 9100
protocol: TCP
selector:
app: node-exporter
type: ClusterIP
—-
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
spec:
template:
metadata:
labels:
app: node-exporter
name: node-exporter
spec:
containers:
– image: registry.cn-hangzhou.aliyuncs.com/tryk8s/node-exporter:latest
name: node-exporter
ports:
– containerPort: 9100
hostPort: 9100
name: scrape
hostNetwork: true
hostPID: true
kubectl create -f node-exporter.yaml
得到一个 daemonset 和一个 service 对象，部署后，为了能够让 Prometheus 能够从当前 node exporter 获取到监控数据，这里需要修改 Prometheus 配置文件。编辑 prometheus.yml 并在 scrape_configs 节点下添加以下内容:
scrape_configs:
# 采集 node exporter 监控数据
– job_name: ‘node’
static_configs:
– targets: [‘localhost:9100’]
也可以使用 prometheus.io/scrape: ‘true’ 标识来自动获取 service 的 metric 接口
– source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

配置完成后，重启 prometheus 就能看到对应的指标
查看指标：
直接查看：
如果是二进制或者 docker 部署，部署成功后可以访问：http://${IP}:9100/metrics
会输出下面格式的内容，包含了 node-exporter 暴露的所有指标：
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile=”0″} 6.1872e-05
go_gc_duration_seconds{quantile=”0.25″} 0.000119463
go_gc_duration_seconds{quantile=”0.5″} 0.000151156
go_gc_duration_seconds{quantile=”0.75″} 0.000198764
go_gc_duration_seconds{quantile=”1″} 0.009889647
go_gc_duration_seconds_sum 0.257232201
go_gc_duration_seconds_count 1187
# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu=”cpu0″,mode=”guest”} 0
node_cpu{cpu=”cpu0″,mode=”guest_nice”} 0
node_cpu{cpu=”cpu0″,mode=”idle”} 68859.19
node_cpu{cpu=”cpu0″,mode=”iowait”} 167.22
node_cpu{cpu=”cpu0″,mode=”irq”} 0
node_cpu{cpu=”cpu0″,mode=”nice”} 19.92
node_cpu{cpu=”cpu0″,mode=”softirq”} 17.05
node_cpu{cpu=”cpu0″,mode=”steal”} 28.1
Prometheus 查看：
类似 go_gc_duration_seconds 和 node_cpu 就是 metric 的名称，如果使用了 Prometheus, 则可以在 http://${IP}:9090/ 页面的指标中搜索到以上的指标：

常用指标类型有：
node_cpu：系统 CPU 使用量
node_disk*：磁盘 IO
node_filesystem*：文件系统用量
node_load1：系统负载
node_memeory*：内存使用量
node_network*：网络带宽
node_time：当前系统时间
go_*：node exporter 中 go 相关指标
process_*：node exporter 自身进程相关运行指标
Grafana 查看：
Prometheus 虽然自带了 web 页面，但一般会和更专业的 Grafana 配套做指标的可视化，Grafana 有很多模板，用于更友好地展示出指标的情况，如 Node Exporter for Prometheus

在 grafana 中配置好变量、导入模板就会有上图的效果。
深入解读
node-exporter 是 Prometheus 官方推荐的 exporter，类似的还有

HAProxy exporter
Collectd exporter
SNMP exporter
MySQL server exporter
….

官方推荐的都会在 https://github.com/prometheus 下，在 exporter 推荐页，也会有很多第三方的 exporter，由个人或者组织开发上传，如果有自定义的采集需求，可以自己编写 exporter，具体的案例可以参考后续的 [自定义 Exporter] 文章
版本问题
因为 node_exporter 是比较老的组件，有一些最佳实践并没有 merge 进去，比如符合 Prometheus 命名规范 (https://prometheus.io/docs/pr…，目前(2019.1) 最新版本为 0.17
一些指标名字的变化（详细比对）

* node_cpu -> node_cpu_seconds_total
* node_memory_MemTotal -> node_memory_MemTotal_bytes
* node_memory_MemFree -> node_memory_MemFree_bytes
* node_filesystem_avail -> node_filesystem_avail_bytes
* node_filesystem_size -> node_filesystem_size_bytes
* node_disk_io_time_ms -> node_disk_io_time_seconds_total
* node_disk_reads_completed -> node_disk_reads_completed_total
* node_disk_sectors_written -> node_disk_written_bytes_total
* node_time -> node_time_seconds
* node_boot_time -> node_boot_time_seconds
* node_intr -> node_intr_total

解决版本问题的方法有两种：

一是在机器上启动两个版本的 node-exporter，都让 prometheus 去采集。
二是使用指标转换器, 他会将旧指标名称转换为新指标

对于 grafana 的展示，可以找同时支持两套指标的 dashboard 模板
Collector
node-exporter 的主函数：
// Package collector includes all individual collectors to gather and export system metrics.
package collector

import (
“fmt”
“sync”
“time”

“github.com/prometheus/client_golang/prometheus”
“github.com/prometheus/common/log”
“gopkg.in/alecthomas/kingpin.v2”
)

// Namespace defines the common namespace to be used by all metrics.
const namespace = “node”

可以看到 exporter 的实现需要引入 github.com/prometheus/client_golang/prometheus 库，client_golang 是 prometheus 的官方 go 库，既可以用于集成现有应用，也可以作为连接 Prometheus HTTP API 的基础库。
比如定义了基础的数据类型以及对应的方法：
Counter：收集事件次数等单调递增的数据
Gauge：收集当前的状态，比如数据库连接数
Histogram：收集随机正态分布数据，比如响应延迟
Summary：收集随机正态分布数据，和 Histogram 是类似的
switch metricType {
case dto.MetricType_COUNTER:
valType = prometheus.CounterValue
val = metric.Counter.GetValue()

case dto.MetricType_GAUGE:
valType = prometheus.GaugeValue
val = metric.Gauge.GetValue()

case dto.MetricType_UNTYPED:
valType = prometheus.UntypedValue
val = metric.Untyped.GetValue()
client_golang 库的详细解析可以参考：theory-source-code
本文为容器监控实践系列文章，完整内容见：container-monitor-book