简介

Alertmanager 解决由客户端应用程序(如 Prometheus server)发送的警报。它负责去重(deduplicating),分组(grouping),并将它们路由(routing)到正确的接收器(receiver)集成,如电子邮件,微信,或钉钉。它还负责解决警报的静默/屏蔽(silencing)、定时发送/不发送(Mute)和克制(inhibition)问题。

AlertManager 作为 开源的为 Prometheus 而设计的告警利用, 曾经具备了告警利用各类丰盛、灵便、可定制的性能:

  • Prometheus AlertManager 系列文章

Jiralert

用于JIRA的Prometheus Alertmanager Webhook Receiver

JIRAlert实现了Alertmanager的webhook HTTP API,并连贯到一个或多个JIRA实例以创立高度可配置的JIRA Issues。每个不同的 Groupkey 创立一个Issue--由Alertmanager的路由配置局部的group_by参数定义--但在警报解决时不会敞开(默认参数, 可调整)。咱们的冀望是,人们会查看这个issue。,采取任何必要的口头,而后敞开它。如果没有人的互动是必要的,那么它可能首先就不应该报警。然而,这种行为能够通过设置auto_resolve局部进行批改,它将以所需的状态解决jira issue。

如果一个相应的JIRA issue。曾经存在,但被解决了,它将被从新关上(reopened)。在解决的状态和重开的状态之间必须存在一个JIRA transition--如reopen_state--否则重开将失败。能够抉择定义一个 "won't fix" 的决定(resolution)--由wont_fix_resolution定义:有此决定的JIRA问题将不会被JIRAlert从新关上。

装置 Jiralert

Jiralert 的装置比较简单, 次要由 Deployment、Secret(Jiralert 的配置)和 Service 组成。典型示例如下:

apiVersion: apps/v1kind: Deploymentmetadata:  name: jiralertspec:  selector:    matchLabels:      app: jiralert  template:    metadata:      labels:        app: jiralert    spec:      containers:      - name: jiralert        image: quay.io/jiralert/jiralert-linux-amd64:latest        imagePullPolicy: IfNotPresent        args:        - "--config=/jiralert-config/jiralert.yml"        - "--log.level=debug"        - "--listen-address=:9097"        readinessProbe:          tcpSocket:            port: 9097          initialDelaySeconds: 15          periodSeconds: 15          timeoutSeconds: 5        livenessProbe:          tcpSocket:            port: 9097          initialDelaySeconds: 15          periodSeconds: 15          timeoutSeconds: 5        ports:        - containerPort: 9091          name: metrics        volumeMounts:        - mountPath: /jiralert-config          name: jiralert-config          readOnly: true      volumes:      - name: jiralert-config        secret:          secretName: jiralert-config---apiVersion: v1kind: Secrettype: Opaquemetadata:  name: jiralert-configstringData:  jiralert.tmpl: |-    {{ define "jira.summary" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join "," }}{{ end }}        {{ define "jira.description" }}{{ range .Alerts.Firing }}Labels:    {{ range .Labels.SortedPairs }} - {{ .Name }} = {{ .Value }}    {{ end }}        Annotations:    {{ range .Annotations.SortedPairs }} - {{ .Name }} = {{ .Value }}    {{ end }}        Source: {{ .GeneratorURL }}    {{ end }}        CommonLabels:    {{ range .CommonLabels.SortedPairs }} - {{ .Name }} = {{ .Value}}    {{ end }}        GroupLabels:    {{ range .GroupLabels.SortedPairs }} - {{ .Name }} = {{ .Value}}    {{ end }}    {{ end }}  jiralert.yml: |-    # Global defaults, applied to all receivers where not explicitly overridden. Optional.    template: jiralert.tmpl    defaults:      # API access fields.      api_url: https://jira.example.com      user: foo      password: bar      # The type of JIRA issue to create. Required.      issue_type: Bug      # Issue priority. Optional.      priority: Major      # Go template invocation for generating the summary. Required.      summary: '{{ template "jira.summary" . }}'      # Go template invocation for generating the description. Optional.      description: '{{ template "jira.description" . }}'      # State to transition into when reopening a closed issue. Required.      reopen_state: "REOPENED"      # Do not reopen issues with this resolution. Optional.      wont_fix_resolution: "Won't Fix"      # Amount of time after being closed that an issue should be reopened, after which, a new issue is created.      # Optional (default: always reopen)      # reopen_duration: 30d        # Receiver definitions. At least one must be defined.    # Receiver names must match the Alertmanager receiver names. Required.    receivers:    - name: 'jiralert'      project: 'YOUR-JIRA-PROJECT'---apiVersion: v1kind: Servicemetadata:  name: jiralertspec:  selector:    app: jiralert  ports:  - port: 9097    targetPort: 9097                

相应 AlertManager 的配置:

...receivers:- name: jiralert  webhook_configs:  - send_resolved: true    url: http://jiralert:9097/alertroutes:- receiver: jiralert  matchers:  - severity = critical  continue: true...

阐明:

  • 官网 jiralert 镜像地址: https://quay.io/repository/ji...

    • 官网 jiralert latest 镜像: <quay.io/jiralert/jiralert-linux-amd64:latest>
  • jiralert.tmpl 相似 AlertManager 的 Template, 发送到 Jira 的 Issue 会以此为模板
  • jiralert.yml Jiralert 的配置文件

    • defaults 根底版配置
    • receivers 能够设置多个 receiver, 届时 AlertManager 要发到哪个 Jira 的receiver就须要与这个 jiralert 的receiver 同名. (比方下面的例子, 都是jiralert)

Jiralert 配置

通过生产实践的 Jiralert 残缺配置如下:

# Global defaults, applied to all receivers where not explicitly overridden. Optional.template: jiralert.tmpldefaults:  # API access fields.  api_url: https://example.atlassian.net  user: <your-account-email>  password: '<your-account-api-token>'  # The type of JIRA issue to create. Required.  issue_type: Support  # Issue priority. Optional.  priority: High  # Go template invocation for generating the summary. Required.  summary: '{{ template "jira.summary" . }}'  # Go template invocation for generating the description. Optional.  description: '{{ template "jira.description" . }}'  # State to transition into when reopening a closed issue. Required.  reopen_state: "Back to in progress"  # Do not reopen issues with this resolution. Optional.  wont_fix_resolution: "Won't Do"  # Amount of time after being closed that an issue should be reopened, after which, a new issue is created.  # Optional (default: always reopen)  reopen_duration: 30d# Receiver definitions. At least one must be defined.# Receiver names must match the Alertmanager receiver names. Required.receivers:- name: 'jiralert'  project: <your-project-code>  add_group_labels: true  auto_resolve:    state: 'Resolve this issue'

具体阐明如下:

  1. api_url: Jira 的地址, 如果用的是 Jira 的 SaaS 服务, 就是https://<tenant>.atlassian.net
  2. 认证:

    1. 对于私有云版的 Jira, 只能用 userpassword, 其中:

      1. user 填写你的账号邮箱地址;
      2. password 须要先在 API Token | Atlassian account 申请 API Token. (留神: 登录用的明码是无奈认证通过的)
    2. 对于其余版本, 也能够填写应用 personal_access_token 进行认证. 其值为: user@example.com:api_token_string 的 base64 编码后字符串. 具体阐明见: Basic auth for REST APIs (atlassian.com)
  3. issue_type: 依据您的 Jira Issue Type 来填写, 可能是: Alert Support Bug New Feature 等等或其余
  4. priority 依据您的 Issue priority 来填写, 可能是: Critical High Medium Low 等等或其余
  5. reopen_state: Jira 的问题曾经敞开, 要从新关上, 须要的 transition, 如: Back to in progress. (留神: 这里须要填写的是您自定义的 transition, 而非 status)
  6. wont_fix_resolution: 带有这个 resolution (解决方案)的问题就不会从新关上. 如: Won't Do Won't Fix, 须要依据本人的 resolution 定义内容来填写.
  7. reopen_duration: 多久工夫之内的问题会从新关上, 默认是 always reopen, 能够设置为如: 30d, 示意这个问题如果30天以前有同样的问题, 新开一个 Issue, 而不是从新关上老的 Issue.
  8. receivers: 能够定义多个 receivers, 指向不同 project
  9. project: Jira 的 Project ID, 是 Project 具体名字的首字母大写. 如 Project 是 For Example, 这里就填写 FE
  10. add_group_labels: 是否要将 AlertManager 的 Group Labels 加到 Jira 的 Labels. (留神: Jira Labels 的 Value 是不能有空格的, 所以如果你的 AlertManager 的 Group Label 的Value如果有空格, 不要开启此项性能)
  11. auto_resolve: 最新 1.2 版本新增的性能, 当告警复原了, 能够主动 resolve 对应的 Jira Issue.

    1. state: 'Resolve this issue' 这里也是要填写您预约义的 Jira 解决该问题的 transition 而非 status, 如'Resolve this issue'.

其余疑难状况

如果你碰到各种诡异的日志, 起因大部分都是因为没有正确认证登录导致的, 典型的比方这个报错:

The value 'XXX' does not exist for the field 'project'.

事实上就是因为没有正确认证登录导致的.

具体能够参考这里: Solved: REST error "The value 'XXX' does not exist for the... (atlassian.com)

还有一类报错, 提醒您无奈 transition an issue, 这往往是因为以下几种起因:

  1. Jiralert 中reopen_stateauto_resolvestate 没有填写正确的 transition
  2. 您用的账号没有相应的权限
  3. 该 Issue 当初所处的状态(比方 Closed)不容许再进行 transition

具体能够参考这里: I can't transition an issue in my Jira project - W... - Atlassian Community

最终成果

如下图:

能够创立 Issue, 更新 Summary, 更新 Description, 更新 Resolution, 更新 Status; 同样问题再次出现, reopen 之前的 Issue...

️ 参考文档

  • jiralert manifests for kubernetes (github.com)
  • jiralert/examples at master · prometheus-community/jiralert (github.com)
  • jiralert images | Quay
三人行, 必有我师; 常识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.