基于docker搭建Prometheus

prometheus 实质上是一个时序数据库, 再配以 alermanager pushgateway 等子组件, 便可搭建成一个监控平台, 目前曾经是比拟支流的做法, 本文次要介绍一下此组件的简略应用和能够利用到的场景.

doc

以 docker-compose 的模式进行配置

在文件夹新建一个 docker-compose.yml 文件, 将以下内容填入.

version: "3.7"  
  
services:  
pro_server:  
image: prom/prometheus   # 官网镜像
ports:  
- "9090:9090"  
volumes:  
- ./prometheus:/prometheus    # 用于存储 Prometheus 的状态, 下次启动能够连续
- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml  # 内部传入 Prometheus 配置

解下来新建 ./ 这是 prometheus 文件夹, 新建 ./docker/prometheus.yml 文件, 写入以下信息

# my global config  
global:  
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.  
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.  
# scrape_timeout is set to the global default (10s).  
  
# Alertmanager configuration  
alerting:  
#alertmanagers:  
#- static_configs:  
#- targets:  
#- pro_alert_manager:9093  
  
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.  
rule_files:   
# - "first_rules.yml"  
# - "second_rules.yml"  
  
# A scrape configuration containing exactly one endpoint to scrape:  
# Here it's Prometheus itself.  
scrape_configs:  
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.  
- job_name: 'prometheus'  
  
# metrics_path defaults to '/metrics'  
# scheme defaults to 'http'.  
  
static_configs:  
- targets: ['localhost:9090']

prometheus 的主体服务, docker-compose up的话, 就能够在浏览器进行 Prometheus 的初体验了.

这个配置文件是 Prometheus 的默认配置, 能够看到它本人申明了一个 job: prometheus, 外面监听了本人的 9090 端口. 你能够自行察看 /metrics 接口内的数据, 领会一下数据结构.

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.3e-06
go_gc_duration_seconds{quantile="0.25"} 8.8e-06
go_gc_duration_seconds{quantile="0.5"} 9.3e-06
go_gc_duration_seconds{quantile="0.75"} 0.000120499
go_gc_duration_seconds{quantile="1"} 0.000344099
go_gc_duration_seconds_sum 0.001536996
go_gc_duration_seconds_count 20
...

轻易指指点点吧.

这种场景就如同下面的默认配置一样, Prometheus 会周期性 pull 接口, 取得 metrics 信息, 写入到本人的时序数据库中.

当初本人开发一个测试接口, 配置到 pomethus 中.

以 python 举例

import flask  
import random  
  
app = flask.Flask(__name__)  
  
@app.route('/metrics', methods=['GET'])  
def hello():  
return f'suzumiya {{quantile="0.75"}} {random.random()}\nkyo {{quantile="0.5"}} {random.random()}'  
  
if __name__ == '__main__':  
app.run('0.0.0.0', 12300)

这个 metrics 接口模拟了默认接口的数据, 接下来, 配置到 Prometheus 的配置文件中.

在 ./docker/prometheus.yml 的开端, 增加以下内容

- job_name: 'test'  
static_configs:  
- targets: ['10.23.51.15:12300']  # 请改成本人的内网 / 外网 IP, 
labels:  
instance: 'test'

接下来重启我的项目. 便能够在页面上找到本人新加的指标.

对于一个监控平台来说, 告警是必不可少的. alertmanager 便是来做这件事

在 docker-compose.yml 文件中, 增加以下内容

pro_alert_manager:  
image: prom/alertmanager  
ports:  
- "9093:9093"  
volumes:  
- ./alertmanager:/alertmanager  # 用于放弃状态
- ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml # 内部传入配置文件

新建 ./docker/alertmanager.yml 文件, 填入以下内容

global:  
resolve_timeout: 5m  
smtp_smarthost: 
smtp_from: 
smtp_auth_username:  
smtp_auth_password:  
  
route:  
group_by: ['alertname']  
group_wait: 10s  
group_interval: 10s  
repeat_interval: 1h  
receiver: 'mememe'  
receivers:  
- name: 'mememe'  
#webhook_configs:  
#- url: 'http://127.0.0.1:5001/'  
email_configs:  
- to: 'xxx@xxx.com'   # 批改接管  
inhibit_rules:  
- source_match:  
severity: 'critical'  
target_match:  
severity: 'warning'  
equal: ['alertname', 'dev', 'instance']

把邮件的配置填入下面相应的空上.

告警其实这里就配置完了, 然而不触发也就没有成果. 于是咱们来配置一个规定, 用于监听 test 接口

将以下内容, 填入 ./docker/prometheus.yml 文件.

# Alertmanager configuration  
alerting:  
alertmanagers:  
- static_configs:  
- targets:  
- pro_alert_manager:9093    # docker 会 server 名主动映射成 host
  
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.  
rule_files:  
- "test_rule.yml"    # 还不存在的配置文件
# - "first_rules.yml"  
# - "second_rules.yml"

接下来是主体的规定配置, 批改 docker-compose.yml 文件, 增加内部配置文件映射

pro_server:  
image: prom/prometheus  
ports:  
- "9090:9090"  
volumes:  
- ./prometheus:/prometheus  
- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml  
- ./docker/test_rule.yml:/etc/prometheus/test_rule.yml   # 新加的映射

新建 ./docker/test_rule.yml 文件, 填入以下内容

groups:  
- name: test-alert  
rules:  
- alert: HttpTestDown  
expr: sum(up{job="test"}) == 0  
for: 10s  
labels:  
severity: critical

重启我的项目, 此时不会有报警,
如果你的 test server 服务还开着的话. 那么将 test server 关掉. 很快应该就会收到一封邮件了.

这个组件能够简略了解成一个打点服务器, 你对这个组件发申请, 这个组件再推送到 Prometheus 中.

批改docker-compose.yml, 增加以下内容:

pro_push_gateway:  
image: prom/pushgateway  
ports:  
- "9091:9091"  
volumes:  
- ./pushgateway:/pushgateway     # 感觉删掉也行, gateway 如同无状态

批改./docker/prometheus.yml, 增加 pushgateway 为 job

- job_name: 'pushgateway'  
static_configs:  
- targets: ['pro_push_gateway:9091']    # docker 会将 server 映射为 host
labels:  
instance: 'pushgateway'

之后重启我的项目, gateway 就能够失效了.
调用的形式有很多种, 这里作为测试选用最简略的 curl 形式.

echo "suzumiya 1000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test
echo "suzumiya 2000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test
echo "suzumiya 3000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test

轻易推推, 就能够在 Prometheus 中看到相应数据了.

基于这个 pushgateway, 咱们能够由服务本人向 promethus 推送相应的数据, 比拟直观的利用, 就是事件打点. 咱们能够将感兴趣的事件推送到 promethus 上, 用 alertmanager 去监控, 又或者连贯 granfana 做一个简略的时序看板.

./docker-compose.yml

  
version: "3.7"  
  
services:  
pro_server:  
image: prom/prometheus  
ports:  
- "9090:9090"  
volumes:  
- ./prometheus:/prometheus  
- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml  
- ./docker/test_rule.yml:/etc/prometheus/test_rule.yml  
pro_push_gateway:  
image: prom/pushgateway  
ports:  
- "9091:9091"  
volumes:  
- ./pushgateway:/pushgateway  
pro_alert_manager:  
image: prom/alertmanager  
ports:  
- "9093:9093"  
volumes:  
- ./alertmanager:/alertmanager  
- ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml

./docker/prometheus.yml

  
version: "3.7"  
  
services:  
pro_server:  
image: prom/prometheus  
ports:  
- "9090:9090"  
volumes:  
- ./prometheus:/prometheus  
- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml  
- ./docker/test_rule.yml:/etc/prometheus/test_rule.yml  
pro_push_gateway:  
image: prom/pushgateway  
ports:  
- "9091:9091"  
volumes:  
- ./pushgateway:/pushgateway  
pro_alert_manager:  
image: prom/alertmanager  
ports:  
- "9093:9093"  
volumes:  
- ./alertmanager:/alertmanager  
- ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml

./docker/alertmanager.yml

global:  
resolve_timeout: 5m  
smtp_smarthost: 
smtp_from: 
smtp_auth_username: 
smtp_auth_password: 
  
route:  
group_by: ['alertname']  
group_wait: 10s  
group_interval: 10s  
repeat_interval: 1h  
receiver: 'mememe'  
receivers:  
- name: 'mememe'  
#webhook_configs:  
#- url: 'http://127.0.0.1:5001/'  
email_configs:  
- to: 'xxxx@xxx.com'  
inhibit_rules:  
- source_match:  
severity: 'critical'  
target_match:  
severity: 'warning'  
equal: ['alertname', 'dev', 'instance']

./docker/test_rule.yml

groups:  
- name: test-alert  
rules:  
- alert: HttpTestDown  
expr: sum(up{job="test"}) == 0  
for: 10s  
labels:  
severity: critical

下面只是小试牛刀. 能够发现配置一个监控平台并不难. 相比拟以前本人去一遍又一遍独自写告警, 这种对立的接口监听事件打点要更跨平台, 也更优雅. 这个组件还有很多值得去学习挖掘的货色. 在有监控相干需要的时候, 无妨思考下, Prometheus 做不做失去?

owari.

基于docker搭建Prometheus

官网文档

docker 配置

prometheus

根本配置

场景: 接口监听

alertmanager

根本配置

pushgateway

根本配置

场景: 事件打点

残缺配置文件

over

Just My Socks（注册教程内含优惠码）

基于docker搭建Prometheus

官网文档

docker 配置

prometheus

根本配置

场景: 接口监听

alertmanager

根本配置

pushgateway

根本配置

场景: 事件打点

残缺配置文件

over

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）