前言

随着 Prometheus 监控的组件、数量、指标越来越多,Prometheus 对计算性能的要求会越来越高,存储占用也会越来越多。

在这种状况下,要优化 Prometheus 性能, 优化存储占用. 第一工夫想到的可能是各种 Prometheus 的兼容存储计划, 如 Thanos 或 VM、Mimir 等。然而实际上尽管集中存储、长期存储、存储降采样及存储压缩能够肯定水平解决相干问题,然而治标不治本。

  • 真正的本,还是在于指标量(series)过于宏大。
  • 治标之法,应该是缩小指标量。有 2 种方法:

    • Prometheus 性能调优 - 解决高基数问题
    • 依据理论应用状况,只保留(keep)展现(Grafana Dashboards)和告警(prometheus rules)会用到的指标。

本次重点介绍第二种方法:如何依据理论的应用状况精简 Prometheus 的指标和存储占用?

思路

  1. 剖析以后 Prometheus 中存储的所有的 metric name(指标项);
  2. 剖析展现环节用到的所有 metric name,即 Grafana 的 Dashboards 用到的所有指标;
  3. 剖析告警环节用到的所有 metric name,即 Prometheus Rule 配置中用到的所有指标;
  4. (可选)剖析诊断环境用到的所有 metric name,即常常在 Prometheus UI 上 query 的指标;
  5. 通过 relabelmetric_relabel_configswrite_relabel_configskeep 2-4 中的指标, 以此大幅缩小 Prometheus 须要存储的指标量.

要具体实现这个思路, 能够通过 Grafana Labs 出品的 mimirtool 来搞定.

我这里有个前后的比照成果, 可供参考这样做成果有多惊人:

  1. 精简前: 270336 流动 series
  2. 精简后: 61055 流动 series
  3. 精简成果: 将近 5 倍的精简率!

Grafana Mimirtool

Grafana Mimir 是一款以对象存储为存储形式的 Prometheus 长期存储解决方案, 从 Cortex 演变而来. 官网号称反对亿级别的 series 写入存储和查问.

Grafana Mimirtool 是 Mimir 公布的一个实用工具, 可独自应用.

Grafana Mimirtool 反对从以下方面提取指标:

  • Grafana 实例中的Grafana Dashboards(通过 Grafana API)
  • Mimir 实例中的 Prometheus alerting 和 recording rules
  • Grafana Dashboards JSON文件
  • Prometheus记alerting 和 recording rules 的 YAML文件

而后,Grafana Mimirtool能够将这些提取的指标与Prometheus或Cloud Prometheus实例中的流动 series 进行比拟,并输入一个 used 指标和 unused 指标的列表。

Prometheus 精简指标实战

假如

假设:

  • 通过kube-prometheus-stack 装置 Prometheus
  • 已装置 Grafana 且作为展现端
  • 已配置相应的 告警规定
  • 除此之外, 无其余须要额定保留的指标

前提

  1. Grafana Mimirtool 从 releases 中找到 mimirtool 对应平台的版本下载即可应用;
  2. 已创立 Grafana API token
  3. Prometheus已装置和配置.

第一步: 剖析 Grafana Dashboards 用到的指标

通过 Grafana API

具体如下:

# 通过 Grafana API剖析 Grafana 用到的指标# 前提是当初 Grafana上创立 API Keysmimirtool analyze grafana --address http://172.16.0.20:32651 --key=eyJrIjoiYjBWMGVoTHZTY3BnM3V5UzNVem9iWDBDSG5sdFRxRVoiLCJuIjoibWltaXJ0b29sIiwiaWQiOjF9

阐明:

  • http://172.16.0.20:32651 是 Grafana 地址
  • --key=eyJr 是 Grafana API Token. 通过如下界面取得:

获取到的是一个 metrics-in-grafana.json, 内容概述如下:

{    "metricsUsed": [        ":node_memory_MemAvailable_bytes:sum",        "alertmanager_alerts",        "alertmanager_alerts_invalid_total",        "alertmanager_alerts_received_total",        "alertmanager_notification_latency_seconds_bucket",        "alertmanager_notification_latency_seconds_count",        "alertmanager_notification_latency_seconds_sum",        "alertmanager_notifications_failed_total",        "alertmanager_notifications_total",        "cluster",        "cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits",        "cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests",        "cluster:namespace:pod_memory:active:kube_pod_container_resource_limits",        "cluster:namespace:pod_memory:active:kube_pod_container_resource_requests",        "cluster:node_cpu:ratio_rate5m",        "container_cpu_cfs_periods_total",        "container_cpu_cfs_throttled_periods_total",        "..."    ],    "dashboards": [        {            "slug": "",            "uid": "alertmanager-overview",            "title": "Alertmanager / Overview",            "metrics": [                "alertmanager_alerts",                "alertmanager_alerts_invalid_total",                "alertmanager_alerts_received_total",                "alertmanager_notification_latency_seconds_bucket",                "alertmanager_notification_latency_seconds_count",                "alertmanager_notification_latency_seconds_sum",                "alertmanager_notifications_failed_total",                "alertmanager_notifications_total"            ],            "parse_errors": null        },        {            "slug": "",            "uid": "c2f4e12cdf69feb95caa41a5a1b423d9",            "title": "etcd",            "metrics": [                "etcd_disk_backend_commit_duration_seconds_bucket",                "etcd_disk_wal_fsync_duration_seconds_bucket",                "etcd_mvcc_db_total_size_in_bytes",                "etcd_network_client_grpc_received_bytes_total",                "etcd_network_client_grpc_sent_bytes_total",                "etcd_network_peer_received_bytes_total",                "etcd_network_peer_sent_bytes_total",                "etcd_server_has_leader",                "etcd_server_leader_changes_seen_total",                "etcd_server_proposals_applied_total",                "etcd_server_proposals_committed_total",                "etcd_server_proposals_failed_total",                "etcd_server_proposals_pending",                "grpc_server_handled_total",                "grpc_server_started_total",                "process_resident_memory_bytes"            ],            "parse_errors": null        },        {...}    ]}

(可选)通过 Grafana Dashboards json 文件

如果无奈创立 Grafana API Token, 只有有 Grafana Dashboards json 文件, 也能够用来剖析, 示例如下:

# 通过 Grafana Dashboard json 剖析 Grafana 用到的指标mimirtool analyze dashboard grafana_dashboards/blackboxexporter-probe.jsonmimirtool analyze dashboard grafana_dashboards/es.json

失去的 json 构造和上一节相似, 就不赘述了.

第二步: 剖析 Prometheus Alerting 和 Recording Rules 用到的指标

具体操作如下:

# (可选)通过 kubectl cp 将用到的 rule files 拷贝到本地kubectl cp <prompod>:/etc/prometheus/rules/<releasename>-kube-prometheus-st-prometheus-rulefiles-0 -c prometheus ./kube-prometheus-stack/rulefiles/# 通过 Prometheus rule files 剖析 Prometheus Rule 用到的指标(波及 recording rule 和 alert rules)mimirtool analyze rule-file ./kube-prometheus-stack/rulefiles/*

后果如下 metrics-in-ruler.json:

{  "metricsUsed": [    "ALERTS",    "aggregator_unavailable_apiservice",    "aggregator_unavailable_apiservice_total",    "apiserver_client_certificate_expiration_seconds_bucket",    "apiserver_client_certificate_expiration_seconds_count",    "apiserver_request_terminations_total",    "apiserver_request_total",    "blackbox_exporter_config_last_reload_successful",    "..."  ],  "ruleGroups": [    {      "namspace": "default-monitor-kube-prometheus-st-kubernetes-apps-ae2b16e5-41d8-4069-9297-075c28c6969e",      "name": "kubernetes-apps",      "metrics": [        "kube_daemonset_status_current_number_scheduled",        "kube_daemonset_status_desired_number_scheduled",        "kube_daemonset_status_number_available",        "kube_daemonset_status_number_misscheduled",        "kube_daemonset_status_updated_number_scheduled",        "..."      ]      "parse_errors": null    },    {      "namspace": "default-monitor-kube-prometheus-st-kubernetes-resources-ccb4a7bc-f2a0-4fe4-87f7-0b000468f18f",      "name": "kubernetes-resources",      "metrics": [        "container_cpu_cfs_periods_total",        "container_cpu_cfs_throttled_periods_total",        "kube_node_status_allocatable",        "kube_resourcequota",        "namespace_cpu:kube_pod_container_resource_requests:sum",        "namespace_memory:kube_pod_container_resource_requests:sum"      ],      "parse_errors": null    },     {...}  ]}            

第三步: 剖析没用到的指标

具体如下:

# 综合剖析 Prometheus 采集到的 VS. (展现(Grafana Dashboards) + 记录及告警(Rule files))mimirtool analyze prometheus --address=http://172.16.0.20:30090/ --grafana-metrics-file="metrics-in-grafana.json" --ruler-metrics-file="metrics-in-ruler.json"

阐明:

  • --address=http://172.16.0.20:30090/ 为 prometheus 地址
  • --grafana-metrics-file="metrics-in-grafana.json" 为第一步失去的 json 文件
  • --ruler-metrics-file="kube-prometheus-stack-metrics-in-ruler.json" 为第二步失去的 json 文件

输入后果prometheus-metrics.json 如下:

{  "total_active_series": 270336,  "in_use_active_series": 61055,  "additional_active_series": 209281,  "in_use_metric_counts": [    {      "metric": "rest_client_request_duration_seconds_bucket",      "count": 8855,      "job_counts": [        {          "job": "kubelet",          "count": 4840        },         {          "job": "kube-controller-manager",          "count": 1958        },        {...}      ]    },    {      "metric": "grpc_server_handled_total",      "count": 4394,      "job_counts": [        {          "job": "kube-etcd",          "count": 4386        },        {          "job": "default/kubernetes-ebao-ebaoops-pods",          "count": 8        }      ]    },    {...}  ],  "additional_metric_counts": [        {      "metric": "rest_client_rate_limiter_duration_seconds_bucket",      "count": 81917,      "job_counts": [        {          "job": "kubelet",          "count": 53966        },        {          "job": "kube-proxy",          "count": 23595        },        {          "job": "kube-scheduler",          "count": 2398        },        {          "job": "kube-controller-manager",          "count": 1958        }      ]    },      {      "metric": "rest_client_rate_limiter_duration_seconds_count",      "count": 7447,      "job_counts": [        {          "job": "kubelet",          "count": 4906        },        {          "job": "kube-proxy",          "count": 2145        },        {          "job": "kube-scheduler",          "count": 218        },        {          "job": "kube-controller-manager",          "count": 178        }      ]    },    {...}  ]}                                 

第四步: 仅 keep 用到的指标

write_relabel_configs 环节配置

如果你有应用 remote_write, 那么间接在 write_relabel_configs 环节配置 keep relabel 规定, 简略粗犷.

能够先用 jp 命令失去所有须要 keep 的metric name:

jq '.metricsUsed' metrics-in-grafana.json \| tr -d '", ' \| sed '1d;$d' \| grep -v 'grafanacloud*' \| paste -s -d '|' -

输入后果相似如下:

instance:node_cpu_utilisation:rate1m|instance:node_load1_per_cpu:ratio|instance:node_memory_utilisation:ratio|instance:node_network_receive_bytes_excluding_lo:rate1m|instance:node_network_receive_drop_excluding_lo:rate1m|instance:node_network_transmit_bytes_excluding_lo:rate1m|instance:node_network_transmit_drop_excluding_lo:rate1m|instance:node_vmstat_pgmajfault:rate1m|instance_device:node_disk_io_time_seconds:rate1m|instance_device:node_disk_io_time_weighted_seconds:rate1m|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_load1|node_load15|node_load5|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_uname_info|up

而后间接在 write_relabel_configs 环节配置 keep relabel 规定:

remote_write:- url: <remote_write endpoint>  basic_auth:    username: <按需>    password: <按需>  write_relabel_configs:  - source_labels: [__name__]    regex: instance:node_cpu_utilisation:rate1m|instance:node_load1_per_cpu:ratio|instance:node_memory_utilisation:ratio|instance:node_network_receive_bytes_excluding_lo:rate1m|instance:node_network_receive_drop_excluding_lo:rate1m|instance:node_network_transmit_bytes_excluding_lo:rate1m|instance:node_network_transmit_drop_excluding_lo:rate1m|instance:node_vmstat_pgmajfault:rate1m|instance_device:node_disk_io_time_seconds:rate1m|instance_device:node_disk_io_time_weighted_seconds:rate1m|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_load1|node_load15|node_load5|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemAvailable_bytes|node_memory_MemFree_bytes|node_memory_MemTotal_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_uname_info|up    action: keep

metric_relabel_configs 环节配置

如果没有应用 remote_write, 那么只能在 metric_relabel_configs 环节配置了.

以 etcd job 为例: (以 prometheus 配置为例, Prometheus Operator 请自行按需调整)

- job_name: serviceMonitor/default/monitor-kube-prometheus-st-kube-etcd/0  honor_labels: false  kubernetes_sd_configs:  - role: endpoints    namespaces:      names:      - kube-system  scheme: https  tls_config:    insecure_skip_verify: true    ca_file: /etc/prometheus/secrets/etcd-certs/ca.crt    cert_file: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt    key_file: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key  relabel_configs:  - source_labels:    - job    target_label: __tmp_prometheus_job_name  - ...  metric_relabel_configs:   - source_labels: [__name__]    regex: etcd_disk_backend_commit_duration_seconds_bucket|etcd_disk_wal_fsync_duration_seconds_bucket|etcd_mvcc_db_total_size_in_bytes|etcd_network_client_grpc_received_bytes_total|etcd_network_client_grpc_sent_bytes_total|etcd_network_peer_received_bytes_total|etcd_network_peer_sent_bytes_total|etcd_server_has_leader|etcd_server_leader_changes_seen_total|etcd_server_proposals_applied_total|etcd_server_proposals_committed_total|etcd_server_proposals_failed_total|etcd_server_proposals_pending|grpc_server_handled_total|grpc_server_started_total|process_resident_memory_bytes|etcd_http_failed_total|etcd_http_received_total|etcd_http_successful_duration_seconds_bucket|etcd_network_peer_round_trip_time_seconds_bucket|grpc_server_handling_seconds_bucket|up    action: keep    

不必 keep 而应用 drop

同样滴, 不必 keep 而改为应用 drop 也是能够的. 这里不再赘述.

总结

本文中,介绍了精简 Prometheus 指标的需要, 而后阐明如何应用 mimirtool analyze 命令来确定Grafana Dashboards 以及 Prometheus Rules 中用到的指标。而后用 analyze prometheus 剖析了展现和告警中usedunused 的流动 series,最初配置了 Prometheus 以仅 keep 用到的指标。

联合这次实战, 精简率能够达到 5 倍左右, 成果还是非常明显的. 举荐试一试. ️️️

️ 参考文档

  • grafana/mimir: Grafana Mimir provides horizontally scalable, highly available, multi-tenant, long-term storage for Prometheus. (github.com)
  • Analyzing and reducing metrics usage with Grafana Mimirtool | Grafana Cloud documentation
三人行, 必有我师; 常识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.