prometheus 实质上是一个时序数据库, 再配以alermanager pushgateway等子组件, 便可搭建成一个监控平台, 目前曾经是比拟支流的做法, 本文次要介绍一下此组件的简略应用和能够利用到的场景.
官网文档
doc
docker配置
以docker-compose的模式进行配置
prometheus
根本配置
在文件夹新建一个docker-compose.yml文件, 将以下内容填入.
version: "3.7" services: pro_server: image: prom/prometheus # 官网镜像ports: - "9090:9090" volumes: - ./prometheus:/prometheus # 用于存储Prometheus的状态, 下次启动能够连续- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml # 内部传入Prometheus配置
解下来新建./这是prometheus
文件夹, 新建./docker/prometheus.yml
文件, 写入以下信息
# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration alerting: #alertmanagers: #- static_configs: #- targets: #- pro_alert_manager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090']
prometheus的主体服务, docker-compose up
的话, 就能够在浏览器进行Prometheus的初体验了.
这个配置文件是Prometheus的默认配置, 能够看到它本人申明了一个job: prometheus
, 外面监听了本人的9090端口. 你能够自行察看 /metrics接口内的数据, 领会一下数据结构.
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.# TYPE go_gc_duration_seconds summarygo_gc_duration_seconds{quantile="0"} 7.3e-06go_gc_duration_seconds{quantile="0.25"} 8.8e-06go_gc_duration_seconds{quantile="0.5"} 9.3e-06go_gc_duration_seconds{quantile="0.75"} 0.000120499go_gc_duration_seconds{quantile="1"} 0.000344099go_gc_duration_seconds_sum 0.001536996go_gc_duration_seconds_count 20...
轻易指指点点吧.
场景: 接口监听
这种场景就如同下面的默认配置一样, Prometheus会周期性pull接口, 取得metrics信息, 写入到本人的时序数据库中.
当初本人开发一个测试接口, 配置到pomethus中.
以python举例
import flask import random app = flask.Flask(__name__) @app.route('/metrics', methods=['GET']) def hello(): return f'suzumiya {{quantile="0.75"}} {random.random()}\nkyo {{quantile="0.5"}} {random.random()}' if __name__ == '__main__': app.run('0.0.0.0', 12300)
这个metrics接口模拟了默认接口的数据, 接下来, 配置到Prometheus的配置文件中.
在./docker/prometheus.yml
的开端, 增加以下内容
- job_name: 'test' static_configs: - targets: ['10.23.51.15:12300'] # 请改成本人的内网/外网 IP, labels: instance: 'test'
接下来重启我的项目. 便能够在页面上找到本人新加的指标.
alertmanager
对于一个监控平台来说, 告警是必不可少的. alertmanager便是来做这件事
根本配置
在docker-compose.yml
文件中, 增加以下内容
pro_alert_manager: image: prom/alertmanager ports: - "9093:9093" volumes: - ./alertmanager:/alertmanager # 用于放弃状态- ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml # 内部传入配置文件
新建./docker/alertmanager.yml
文件, 填入以下内容
global: resolve_timeout: 5m smtp_smarthost: smtp_from: smtp_auth_username: smtp_auth_password: route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'mememe' receivers: - name: 'mememe' #webhook_configs: #- url: 'http://127.0.0.1:5001/' email_configs: - to: 'xxx@xxx.com' # 批改接管 inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
把邮件的配置填入下面相应的空上.
告警其实这里就配置完了, 然而不触发也就没有成果. 于是咱们来配置一个规定, 用于监听test接口
将以下内容, 填入./docker/prometheus.yml
文件.
# Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: - pro_alert_manager:9093 # docker会server名主动映射成host # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: - "test_rule.yml" # 还不存在的配置文件# - "first_rules.yml" # - "second_rules.yml"
接下来是主体的规定配置, 批改docker-compose.yml
文件, 增加内部配置文件映射
pro_server: image: prom/prometheus ports: - "9090:9090" volumes: - ./prometheus:/prometheus - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml - ./docker/test_rule.yml:/etc/prometheus/test_rule.yml # 新加的映射
新建./docker/test_rule.yml
文件, 填入以下内容
groups: - name: test-alert rules: - alert: HttpTestDown expr: sum(up{job="test"}) == 0 for: 10s labels: severity: critical
重启我的项目, 此时不会有报警,
如果你的test server服务还开着的话. 那么将 test server 关掉. 很快应该就会收到一封邮件了.
pushgateway
这个组件能够简略了解成一个打点服务器, 你对这个组件发申请, 这个组件再推送到Prometheus中.
根本配置
批改docker-compose.yml
, 增加以下内容:
pro_push_gateway: image: prom/pushgateway ports: - "9091:9091" volumes: - ./pushgateway:/pushgateway # 感觉删掉也行, gateway如同无状态
批改./docker/prometheus.yml
, 增加pushgateway为job
- job_name: 'pushgateway' static_configs: - targets: ['pro_push_gateway:9091'] # docker会将server映射为hostlabels: instance: 'pushgateway'
之后重启我的项目, gateway就能够失效了.
调用的形式有很多种, 这里作为测试选用最简略的curl形式.
echo "suzumiya 1000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/testecho "suzumiya 2000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/testecho "suzumiya 3000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test
轻易推推, 就能够在Prometheus中看到相应数据了.
场景: 事件打点
基于这个pushgateway, 咱们能够由服务本人向promethus推送相应的数据, 比拟直观的利用, 就是事件打点. 咱们能够将感兴趣的事件推送到promethus上, 用alertmanager去监控, 又或者连贯granfana做一个简略的时序看板.
残缺配置文件
./docker-compose.yml
version: "3.7" services: pro_server: image: prom/prometheus ports: - "9090:9090" volumes: - ./prometheus:/prometheus - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml - ./docker/test_rule.yml:/etc/prometheus/test_rule.yml pro_push_gateway: image: prom/pushgateway ports: - "9091:9091" volumes: - ./pushgateway:/pushgateway pro_alert_manager: image: prom/alertmanager ports: - "9093:9093" volumes: - ./alertmanager:/alertmanager - ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml
./docker/prometheus.yml
version: "3.7" services: pro_server: image: prom/prometheus ports: - "9090:9090" volumes: - ./prometheus:/prometheus - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml - ./docker/test_rule.yml:/etc/prometheus/test_rule.yml pro_push_gateway: image: prom/pushgateway ports: - "9091:9091" volumes: - ./pushgateway:/pushgateway pro_alert_manager: image: prom/alertmanager ports: - "9093:9093" volumes: - ./alertmanager:/alertmanager - ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml
./docker/alertmanager.yml
global: resolve_timeout: 5m smtp_smarthost: smtp_from: smtp_auth_username: smtp_auth_password: route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'mememe' receivers: - name: 'mememe' #webhook_configs: #- url: 'http://127.0.0.1:5001/' email_configs: - to: 'xxxx@xxx.com' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance']
./docker/test_rule.yml
groups: - name: test-alert rules: - alert: HttpTestDown expr: sum(up{job="test"}) == 0 for: 10s labels: severity: critical
over
下面只是小试牛刀. 能够发现配置一个监控平台并不难. 相比拟以前本人去一遍又一遍独自写告警, 这种对立的接口监听事件打点要更跨平台, 也更优雅. 这个组件还有很多值得去学习挖掘的货色. 在有监控相干需要的时候, 无妨思考下, Prometheus做不做失去?
owari.