共计 6553 个字符,预计需要花费 17 分钟才能阅读完成。
简介
Alertmanager 解决由客户端应用程序(如 Prometheus server)发送的警报。它负责去重 (deduplicating),分组(grouping),并将它们路由(routing) 到正确的接收器 (receiver) 集成,如电子邮件,微信,或钉钉。它还负责解决警报的静默 / 屏蔽 (silencing)、定时发送 / 不发送(Mute) 和克制 (inhibition) 问题。
AlertManager 作为 开源的为 Prometheus 而设计的告警利用, 曾经具备了告警利用各类丰盛、灵便、可定制的性能:
- Prometheus AlertManager 系列文章
Jiralert
用于 JIRA 的 Prometheus Alertmanager Webhook Receiver。
JIRAlert 实现了 Alertmanager 的 webhook HTTP API,并连贯到一个或多个 JIRA 实例以创立高度可配置的 JIRA Issues。每个不同的 Groupkey 创立一个 Issue– 由 Alertmanager 的路由配置局部的 group_by
参数定义 – 但在警报解决时不会敞开 (默认参数, 可调整)。咱们的冀望是,人们会查看这个 issue。,采取任何必要的口头,而后敞开它。如果没有人的互动是必要的,那么它可能首先就不应该报警。然而,这种行为能够通过设置auto_resolve
局部进行批改,它将以所需的状态解决 jira issue。
如果一个相应的 JIRA issue。曾经存在,但被解决了,它将被从新关上 (reopened)。在解决的状态和重开的状态之间必须存在一个JIRA transition– 如reopen_state
– 否则重开将失败。能够抉择定义一个 “won’t fix” 的决定(resolution)– 由wont_fix_resolution
定义:有此决定的 JIRA 问题将不会被 JIRAlert 从新关上。
装置 Jiralert
Jiralert 的装置比较简单, 次要由 Deployment、Secret(Jiralert 的配置)和 Service 组成。典型示例如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: jiralert
spec:
selector:
matchLabels:
app: jiralert
template:
metadata:
labels:
app: jiralert
spec:
containers:
- name: jiralert
image: quay.io/jiralert/jiralert-linux-amd64:latest
imagePullPolicy: IfNotPresent
args:
- "--config=/jiralert-config/jiralert.yml"
- "--log.level=debug"
- "--listen-address=:9097"
readinessProbe:
tcpSocket:
port: 9097
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
livenessProbe:
tcpSocket:
port: 9097
initialDelaySeconds: 15
periodSeconds: 15
timeoutSeconds: 5
ports:
- containerPort: 9091
name: metrics
volumeMounts:
- mountPath: /jiralert-config
name: jiralert-config
readOnly: true
volumes:
- name: jiralert-config
secret:
secretName: jiralert-config
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: jiralert-config
stringData:
jiralert.tmpl: |-
{{define "jira.summary"}}[{{.Status | toUpper}}{{if eq .Status "firing"}}:{{.Alerts.Firing | len}}{{end}}] {{.GroupLabels.SortedPairs.Values | join ","}}{{end}}
{{define "jira.description"}}{{range .Alerts.Firing}}Labels:
{{range .Labels.SortedPairs}} - {{.Name}} = {{.Value}}
{{end}}
Annotations:
{{range .Annotations.SortedPairs}} - {{.Name}} = {{.Value}}
{{end}}
Source: {{.GeneratorURL}}
{{end}}
CommonLabels:
{{range .CommonLabels.SortedPairs}} - {{.Name}} = {{.Value}}
{{end}}
GroupLabels:
{{range .GroupLabels.SortedPairs}} - {{.Name}} = {{.Value}}
{{end}}
{{end}}
jiralert.yml: |-
# Global defaults, applied to all receivers where not explicitly overridden. Optional.
template: jiralert.tmpl
defaults:
# API access fields.
api_url: https://jira.example.com
user: foo
password: bar
# The type of JIRA issue to create. Required.
issue_type: Bug
# Issue priority. Optional.
priority: Major
# Go template invocation for generating the summary. Required.
summary: '{{template"jira.summary".}}'
# Go template invocation for generating the description. Optional.
description: '{{template"jira.description".}}'
# State to transition into when reopening a closed issue. Required.
reopen_state: "REOPENED"
# Do not reopen issues with this resolution. Optional.
wont_fix_resolution: "Won't Fix"
# Amount of time after being closed that an issue should be reopened, after which, a new issue is created.
# Optional (default: always reopen)
# reopen_duration: 30d
# Receiver definitions. At least one must be defined.
# Receiver names must match the Alertmanager receiver names. Required.
receivers:
- name: 'jiralert'
project: 'YOUR-JIRA-PROJECT'
---
apiVersion: v1
kind: Service
metadata:
name: jiralert
spec:
selector:
app: jiralert
ports:
- port: 9097
targetPort: 9097
相应 AlertManager 的配置:
...
receivers:
- name: jiralert
webhook_configs:
- send_resolved: true
url: http://jiralert:9097/alert
routes:
- receiver: jiralert
matchers:
- severity = critical
continue: true
...
📝 阐明:
-
官网 jiralert 镜像地址: https://quay.io/repository/ji…
- 官网 jiralert latest 镜像: <quay.io/jiralert/jiralert-linux-amd64:latest>
jiralert.tmpl
相似 AlertManager 的 Template, 发送到 Jira 的 Issue 会以此为模板-
jiralert.yml
Jiralert 的配置文件defaults
根底版配置receivers
能够设置多个 receiver, 届时 AlertManager 要发到哪个 Jira 的 receiver 就须要与这个 jiralert 的 receiver 同名. (比方下面的例子, 都是jiralert
)
Jiralert 配置
通过生产实践的 Jiralert 残缺配置如下:
# Global defaults, applied to all receivers where not explicitly overridden. Optional.
template: jiralert.tmpl
defaults:
# API access fields.
api_url: https://example.atlassian.net
user: <your-account-email>
password: '<your-account-api-token>'
# The type of JIRA issue to create. Required.
issue_type: Support
# Issue priority. Optional.
priority: High
# Go template invocation for generating the summary. Required.
summary: '{{template"jira.summary".}}'
# Go template invocation for generating the description. Optional.
description: '{{template"jira.description".}}'
# State to transition into when reopening a closed issue. Required.
reopen_state: "Back to in progress"
# Do not reopen issues with this resolution. Optional.
wont_fix_resolution: "Won't Do"
# Amount of time after being closed that an issue should be reopened, after which, a new issue is created.
# Optional (default: always reopen)
reopen_duration: 30d
# Receiver definitions. At least one must be defined.
# Receiver names must match the Alertmanager receiver names. Required.
receivers:
- name: 'jiralert'
project: <your-project-code>
add_group_labels: true
auto_resolve:
state: 'Resolve this issue'
📝具体阐明如下:
api_url
: Jira 的地址, 如果用的是 Jira 的 SaaS 服务, 就是https://<tenant>.atlassian.net
-
认证:
-
对于私有云版的 Jira, 只能用
user
和password
, 其中:user
填写你的账号邮箱地址;password
须要先在 API Token | Atlassian account 申请 API Token. (🐾留神: 登录用的明码是无奈认证通过的)
- 对于其余版本, 也能够填写应用
personal_access_token
进行认证. 其值为:user@example.com:api_token_string
的 base64 编码后字符串. 具体阐明见: Basic auth for REST APIs (atlassian.com)
-
issue_type
: 依据您的 Jira Issue Type 来填写, 可能是:Alert
Support
Bug
New Feature
等等或其余priority
依据您的 Issue priority 来填写, 可能是:Critical
High
Medium
Low
等等或其余reopen_state
: Jira 的问题曾经敞开, 要从新关上, 须要的 transition, 如:Back to in progress
. (🐾留神: 这里须要填写的是您自定义的 transition, 而非 status)wont_fix_resolution
: 带有这个 resolution (解决方案)的问题就不会从新关上. 如:Won't Do
Won't Fix
, 须要依据本人的 resolution 定义内容来填写.reopen_duration
: 多久工夫之内的问题会从新关上, 默认是always reopen
, 能够设置为如:30d
, 示意这个问题如果 30 天以前有同样的问题, 新开一个 Issue, 而不是从新关上老的 Issue.receivers
: 能够定义多个 receivers, 指向不同project
project
: Jira 的 Project ID, 是 Project 具体名字的首字母大写. 如 Project 是For Example
, 这里就填写FE
add_group_labels
: 是否要将 AlertManager 的 Group Labels 加到 Jira 的 Labels. (🐾留神: Jira Labels 的 Value 是不能有空格的, 所以如果你的 AlertManager 的 Group Label 的 Value 如果有空格, 不要 开启此项性能)-
auto_resolve
: 最新 1.2 版本新增的性能, 当告警复原了, 能够主动 resolve 对应的 Jira Issue.state: 'Resolve this issue'
这里也是要填写您预约义的 Jira 解决该问题的 transition 而非 status, 如'Resolve this issue'
.
其余疑难状况
如果你碰到各种诡异的日志, 起因大部分都是因为没有正确认证登录导致的, 典型的比方这个报错:
The value 'XXX' does not exist for the field 'project'.
事实上就是因为没有正确认证登录导致的.
具体能够参考这里: Solved: REST error “The value ‘XXX’ does not exist for the… (atlassian.com)
还有一类报错, 提醒您无奈 transition an issue
, 这往往是因为以下几种起因:
- Jiralert 中
reopen_state
或auto_resolve
的state
没有填写正确的transition
- 您用的账号没有相应的权限
- 该 Issue 当初所处的状态 (比方
Closed
) 不容许再进行transition
具体能够参考这里: I can’t transition an issue in my Jira project – W… – Atlassian Community
最终成果
如下图:
能够创立 Issue, 更新 Summary, 更新 Description, 更新 Resolution, 更新 Status; 同样问题再次出现, reopen 之前的 Issue…
🎉🎉🎉
📚️ 参考文档
- jiralert manifests for kubernetes (github.com)
- jiralert/examples at master · prometheus-community/jiralert (github.com)
- jiralert images | Quay
三人行, 必有我师; 常识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.