Traefik监控零碎搭建

背景

  • 前边的文章Traefik学习中曾经介绍了Traefik的应用,然而如果没有一个可视化的Traefik拜访状态与Acces Log的Dashboard界面的话,对于一个网关来说实际上是不残缺的,这篇文章就来介绍应用Prometheus + Grafana + Promtail+ Loki构建Traefik的监控核心
  • Prometheus是云原生时代事实上的零碎(服务)状态监测规范,通过基于HTTP的pull形式采集时序数据,能够通过服务发现或者动态配置去获取要采集的指标服务器,反对单主节点工作,反对多种可视化图表及仪表盘--在本文中Prometheus用来收集Traefik Metrics数据
  • Grafana是一个开源的度量剖析与可视化套件。 纯Javascript 开发的前端工具,通过拜访库(如InfluxDB、Prometheus),展现自定义报表、显示图表等。Grafana的UI更加灵便,有丰盛的插件,功能强大--在本文中Grafana用来展现来自Prometheus和Loki的数据
  • Promtail是一个日志收集的代理,它会将本地日志的内容发送到一个Loki实例,它通常部署到须要监督应用程序的每台机器/容器上。Promtail次要是用来发现指标、将标签附加到日志流以及将日志推送到Loki--本文中Promtail用来收集Traefik Access Log
  • Grafana Loki是一组能够组成一个功能齐全的日志堆栈组件,与其它日志零碎不同的是,Loki只建设日志标签的索引而不索引原始日志音讯,而是为日志数据设置一组标签,这意味着Loki的经营老本更低,效率也能进步几个数量级,一句话形容下Loki就是like Prometheus, but for logs--本文中Loki用来整合来自Promtail的日志数据

Traefik配置

  • 对于Traefik的配置,最要害的就是开启Metrics与Access Log的配置,动态配置文件traefik.toml如下

    [log]  level = "WARN"  format = "common"  filePath = "/logs/traefik.log"[accessLog]  filePath = "/logs/access.log"  bufferingSize = 100  format = "json"  [accessLog.fields.names]      "StartUTC" = "drop"  [accessLog.filters]    retryAttempts = true    minDuration = "10ms"
    • 这里只展现日志相干的要害配置
    • StartUTC的设置是为了设置日志应用的时区工夫,配合TZ环境变量应用
  • traefik部署的Docker Compose配置文件traefik.yaml如下:

    version: '3'services:  reverse-proxy:    image: traefik    restart: always    environment:      - TZ=Asia/Shanghai    ports:      - "80:80"      - "443:443"    networks:      - traefik    volumes:      - ./traefik.toml:/etc/traefik/traefik.toml      - /var/run/docker.sock:/var/run/docker.sock      - ./config/:/etc/traefik/config/:ro      - ./acme.json:/letsencrypt/acme.json      - ./logs:/logs/:rw    container_name: traefik    # 网关健康检查    healthcheck:      test: ["CMD-SHELL", "wget -q --spider --proxy off localhost:8080/ping || exit 1"]      interval: 3s      timeout: 5s# 创立内部网卡 docker network create traefiknetworks:  traefik:    external: true
    • 要害的局部是:

      • 指定日志应用的时区的环境变量TZ
      • 挂载本地的日志目录./logs

监控零碎搭建

  • Prometheus配置文件prometheus-conf.yaml如下:

    global:  scrape_interval:     15s  external_labels:    monitor: 'codelab-monitor'scrape_configs:  - job_name: 'node'    scrape_interval: 5s    static_configs:      - targets: ['traefik:8080']
  • Loki配置文件loki.yaml

    auth_enabled: falseserver:  http_listen_port: 3100ingester:  wal:    dir: /loki/wal  lifecycler:    address: 127.0.0.1    ring:      kvstore:        store: inmemory      replication_factor: 1    final_sleep: 0s  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h  chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)  max_transfer_retries: 0     # Chunk transfers disabledschema_config:  configs:    - from: 2020-10-24      store: boltdb-shipper      object_store: filesystem      schema: v11      index:        prefix: index_        period: 24hstorage_config:  boltdb_shipper:    active_index_directory: /loki/boltdb-shipper-active    cache_location: /loki/boltdb-shipper-cache    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space    shared_store: filesystem  filesystem:    directory: /loki/chunkscompactor:  working_directory: /loki/boltdb-shipper-compactor  shared_store: filesystemlimits_config:  reject_old_samples: true  reject_old_samples_max_age: 168hchunk_store_config:  max_look_back_period: 0stable_manager:  retention_deletes_enabled: false  retention_period: 0sruler:  storage:    type: local    local:      directory: /loki/rules  rule_path: /loki/rules-temp  alertmanager_url: http://localhost:9093  ring:    kvstore:      store: inmemory  enable_api: truefrontend:  max_outstanding_per_tenant: 2048
  • promatil配置文件promtail.yaml:

    server:  http_listen_port: 9080  grpc_listen_port: 0positions:  filename: /tmp/positions.yamlclients:  - url: http://loki:3100/loki/api/v1/pushscrape_configs:- job_name: app  static_configs:  - targets:      - localhost    labels:      job: app      __path__: /var/log/*log
    • 留神这里的lables标签中设置的job的名字是app,前面在Grafana中设置Dashboard时须要应用这个值
  • Docker Compose配置文件prometheus.yaml如下:

    version: "3"services:   prometheus:    restart: always    image: prom/prometheus:v2.28.0    container_name: prometheus    volumes:      - ./:/etc/prometheus/    command:      - "--config.file=/etc/prometheus/prometheus-conf.yaml"      - "--storage.tsdb.path=/prometheus"      - "--web.console.libraries=/etc/prometheus/console_libraries"      - "--web.console.templates=/etc/prometheus/consoles"      - "--storage.tsdb.retention.time=720h"      - "--web.enable-lifecycle"    ports:      - 9090:9090  grafana:    image: grafana/grafana:8.1.2    container_name: grafana    restart: always    ports:      - 3000:3000    depends_on:      - prometheus      - loki  loki:    image: grafana/loki    expose:      - "3100"    volumes:      - ./loki.yaml:/etc/loki/local-config.yaml      - loki_data:/loki    command: -config.file=/etc/loki/local-config.yaml  promtail:    image: grafana/promtail    depends_on:      - loki    volumes:      - /root/traefik/logs:/var/log      - ./promtail.yaml:/etc/promtail/config.yml    command: -config.file=/etc/promtail/config.ymlnetworks:  default:    external:      name: traefikvolumes:  loki_data:

Grafana配置

  • 拜访到Grafana后,应用admin:admin登录

配置数据源

  • Prometheus:左侧菜单Configuration → Data Sources,点击/编辑默认Prometheus数据源,配置URL为:http://prometheus:9090,Save & Test
  • Loki:左侧菜单Configuration → Data Sources,点击/编辑默认Loki数据源,配置URL为:http://loki:3100,Save & Test
  • 引入两个数据源后能够在左侧面板的Explore中查看是否能查问到数据,以Loki为例,抉择Log browser,抉择日志文件,随后点击Show Logs应能看到收集到的日志数据

配置Dashboard

  • 在左侧面板中抉择Create->Import能够应用ID引入Grafana Dashboard市场中提供的反对Traefik Metrics的Dashboard

    • 给Dashboard加上星标,就能够在Configuration->Preferences->Home Dashboard中设置为首页Dashboard
  • 在左侧面板中Create->Import能够应用ID:13713引入Traefik Via Loki这个Dashboard,用来展现Traefik的Log

    • 这个Dashboard的应用须要做两个配置:

      • 引入Dashboar后会发现没数据,此时点击右上角的⚙键,进行Dashboard配置,进入Json Model,将JSON配置文件中的所有{job="/var/log/traefik.log"}替换成{job="app"}(源于上边的promtail的配置),随后Save Changes->Save Dashboard

      • Dashboard的Request Route局部会报错显示Panel plugin not found: grafana-piechart-panel,此时执行命令docker exec -i grafana sh -c 'grafana-cli plugins install grafana-piechart-panel'在容器内装置该插件,随后重启容器docker restart grafana即可

其余

  • Grafana如果仅供本人应用,不倡议将服务裸露在公网,能够参考端口映射将服务映射到本地服务器

参考

  • Traefik 2 监控零碎之Grafana Prometheus Promtail Loki完满联合
  • 从ELK/EFK到PLG – 在EKS中实现基于Promtail + Loki + Grafana容器日志解决方案
  • Traefik Logs
  • loki mkdir wal: permission denied
  • Are you trying to mount a directory onto a file (or vice-versa)?

    • 补充:个别呈现这种谬误时,首先看看本人指定的目录是否存在问题,比方名字是否打错了这种...
  • Grafana Plugin Install over Docker
  • Traefik Via Loki
  • 另一种计划的参考:ElasticSearch + FileBeat + Grafana
  • Datasource proxy returning "too many outstanding requests"