乐趣区

关于mysql:PrometheusGrafana钉钉部署一个单机的MySQL监控告警系统

  • GreatSQL 社区原创内容未经受权不得随便应用,转载请分割小编并注明起源。
  • GreatSQL 是 MySQL 的国产分支版本,应用上与 MySQL 统一。

[toc]

一、Prometheus

# 1. 下载
wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz

# 2. 解压
tar xvpf prometheus-2.35.0.linux-amd64.tar.gz -C /usr/local

# 3. 建软链
ln -s /usr/local/prometheus-2.35.0.linux-amd64 /usr/local/prometheus

# 4. 建用户和目录并开权限
groupadd prometheus
useradd prometheus -g prometheus -s /sbin/nologin
mkdir -p /data/prometheus
chown prometheus.prometheus /data/prometheus -R
chown prometheus.prometheus /usr/local/prometheus/ -R

# 5. 配置启动
echo '
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus \
          --config.file=/usr/local/prometheus/prometheus.yml \
          --storage.tsdb.path=/data/prometheus \
          --web.console.templates=/usr/local/prometheus/consoles \
          --web.console.libraries=/usr/local/prometheus/console_libraries
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=60s
[Install]
WantedBy=multi-user.target
' > /usr/lib/systemd/system/prometheus.service

# 6. 增加 prometheus.yml 配置
echo '
global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093
    scheme: http
    timeout: 10s

rule_files:
- /usr/local/prometheus/rules.d/*.rules

scrape_configs:
- job_name: prometheus
  honor_timestamps: true
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9090

- job_name: node-exporter
  honor_timestamps: true
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9100

- job_name: mysqld-exporter
  honor_timestamps: true
  scrape_interval: 5s
  scrape_timeout: 5s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - localhost:9104
' > /usr/local/prometheus/prometheus.yml


# 7. 启动
systemctl enable prometheus.service
systemctl start prometheus.service

# 8. 确认开启
[root@mgr2 prometheus]# netstat -nltp|grep prometheus
tcp6       0      0 :::9090                 :::*                    LISTEN      11028/prometheus

9. 浏览器拜访呈现 prometheus 的治理后盾

http://192.168.6.216:9090

以上单机版的 prometheus 服务端就部署实现了,接下来咱们部署下node_exportmysqld_export 来采集零碎和 MySQL 的监控数据。

二、exporter

exporter 是客户端采集模块,除了零碎模块 node_exporter 之外,每个利用都有本人相应的模块,比方 MySQL 的 mysqld_exporter

建设一个 exporter 对立治理目录

mkdir -p /usr/local/prometheus_exporter
chown prometheus.prometheus /usr/local/prometheus_exporter -R

2.1 node_exporter

用来监控零碎指标的 exporter 包含内存、CPU、磁盘空间、磁盘 IO、网络等一系列指标数据。

# 1. 下载解压
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.0/node_exporter-0.18.0.linux-amd64.tar.gz
tar xvpf node_exporter-0.18.0.linux-amd64.tar.gz
cd node_exporter-0.18.0.linux-amd64/
mv node_exporter /usr/local/prometheus_exporter/
chown prometheus:prometheus /usr/local/prometheus_exporter/ -R

# 2. 配置启动服务
echo '
[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus_exporter/node_exporter
Restart=on-failure
RestartSec=60s
[Install]
WantedBy=multi-user.target
' > /usr/lib/systemd/system/node_exporter.service

# 5. 启动
systemctl enable node_exporter.service
systemctl start node_exporter.service

# 6. 确认开启
[root@mgr2 node_exporter]# netstat -nltp|grep node_exporter
tcp6       0      0 :::9100                 :::*                    LISTEN      15654/node_exporter

# 7. 确认采集到数据
[root@mgr2 prometheus]# curl http://192.168.6.216:9100/metrics
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 273849.94
node_cpu_seconds_total{cpu="0",mode="iowait"} 607.22
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 84.82
node_cpu_seconds_total{cpu="0",mode="softirq"} 3.35
node_cpu_seconds_total{cpu="0",mode="steal"} 0
node_cpu_seconds_total{cpu="0",mode="system"} 5026.1
node_cpu_seconds_total{cpu="0",mode="user"} 3723.54
# HELP node_disk_io_now The number of I/Os currently in progress.
# TYPE node_disk_io_now gauge
node_disk_io_now{device="dm-0"} 0
node_disk_io_now{device="dm-1"} 0

2.2 mysqld_exporter

监控 MySQL 的 exporter,包含连接数、同步状态,InnoDB 状态、响应状态等。

# 下载解压
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gz
tar xvpf mysqld_exporter-0.11.0.linux-amd64.tar.gz
cd mysqld_exporter-0.11.0.linux-amd64
mv mysqld_exporter /usr/local/prometheus_exporter/
chown prometheus:prometheus /usr/local/prometheus_exporter/ -R

# 3. 创立监控用的账户权限,数据库是 8.0 版本
CREATE USER 'mysqlmonitor'@'127.0.0.1' IDENTIFIED BY 'mc.2022' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqlmonitor'@'127.0.0.1';
ALTER USER 'mysqlmonitor'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'mc.2022';
flush privileges;

# 4. 配置启动服务
vi /usr/lib/systemd/system/mysqld_exporter.service

[Unit]
Description=mysqld_exporter
After=network.target
[Service]
Type=simple
User=prometheus
Environment='DATA_SOURCE_NAME=mysqlmonitor:mc.2022@tcp(127.0.0.1:3306)'
ExecStart=/usr/local/prometheus_exporter/mysqld_exporter \
          --config.my-cnf='/data/GreatSQL/my.cnf' \
          --collect.engine_innodb_status \
          --collect.slave_status \
          --web.listen-address=:9104 \
          --web.telemetry-path=/metrics
Restart=on-failure
RestartSec=60s
[Install]
WantedBy=multi-user.target

# 5. 启动
systemctl enable mysqld_exporter.service
systemctl start mysqld_exporter.service

# 6. 确认开启
[root@mgr2 prometheus]# netstat -nltp|grep mysqld_export
tcp6       0      0 :::9104                 :::*                    LISTEN      14639/mysqld_export

# 7. 确认采集到数据
[root@mgr2 prometheus]# curl http://192.168.6.216:9104/metrics
# TYPE mysql_up gauge
mysql_up 1
......

三、grafana

通过 grafana 咱们能够将采集到的数据通过可视化的形式展示进去,对采集的数据进行展现和分类等。

grafana 的数据源既能够是 prometheus 也能够是 zabbix、ES 等、这是一个提供多种数据接口的数据展现软件。

3.1 部署

# 1. 通过 rpm 装置
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.3-1.x86_64.rpm
yum install grafana-enterprise-8.5.3-1.x86_64.rpm

# 2. 备份先原来的配置
mv /etc/grafana/grafana.ini /etc/grafana/grafana.ini.`date +"%Y-%m-%d"`.bak

# 3. 创立目录
mkdir -p /data/grafana
mkdir -p /data/logs/grafana
mkdir -p /usr/local/grafana/plugins
chown grafana.grafana /data/grafana
chown grafana.grafana /data/logs/grafana
chown grafana.grafana /usr/local/grafana/plugins
 
# 4. 拷贝模板文件进行替换

echo '
app_mode = production
[paths]
data = /data/grafana
temp_data_lifetime = 24h
logs = /data/logs/grafana
plugins = /usr/local/grafana/plugins

[server]
protocol = http
http_port = 3000
domain = gkht
root_url = http://192.168.6.216:3000
enable_gzip = true

[database]
log_queries =

[remote_cache]
[session]
provider = file
[dataproxy]
[analytics]
reporting_enabled = false
check_for_updates = false
[security]
admin_user = admin
admin_password = admin
secret_key = SW2YcwTIb9zpOOhoPsMm
[snapshots]
[dashboards]
versions_to_keep = 10
[users]
default_theme = dark
[auth]
[auth.anonymous]
enabled = true
org_role = Viewer
[auth.github]
[auth.google]
[auth.generic_oauth]
[auth.grafana_com]
[auth.proxy]
[auth.basic]
[auth.ldap]
[smtp]
[emails]
[log]
mode = console file
level = info
[log.console]
[log.file]
log_rotate = true
daily_rotate = true
max_days = 30
[log.syslog]
[alerting]
enabled = true
execute_alerts = true
[explore]
[metrics]
enabled = true
interval_seconds = 10
[metrics.graphite]
[tracing.jaeger]

[grafana_com]
url = https://grafana.com
[external_image_storage]
[external_image_storage.s3]
[external_image_storage.webdav]
[external_image_storage.gcs]
[external_image_storage.azure_blob]
[external_image_storage.local]
[rendering]
[enterprise]
[panels]
' > /etc/grafana/grafana.ini

chown grafana.grafana /etc/grafana/grafana.ini

# 5. 开启
systemctl enable grafana-server.service
systemctl start grafana-server.service

# 6. 查看开启状态
[root@mgr2 opt]# netstat -nltp|grep grafana
tcp6       0      0 :::3000                 :::*                    LISTEN      23647/grafana-serve

7. 浏览器拜访

http://192.168.6.216:3000/login

账户明码都是 admin 登陆后先改下管理员明码,这里演示就跳过,上面是主界面

3.2 配置数据源

1. 设置,Data sources

2.Add data source

3. 输出 Prometheus

4. 增加数据源信息

5. 测试连贯

6. 返回数据源

3.3 配置监控模板

模板库:https://grafana.com/dashboards

node_exporter 面板

抉择数据源 Prometheus,输出 exporter,抉择活跃度高的。

点击面板后,查看右侧 id 值

1. 抉择导入 Import

2. 输出 id 值,而后 load

3. 输出名称,抉择数据源,点击导入

4. 查看监控数据

5. 保留面板

mysqld_exporter 面板

还是像之前步骤一样,搜寻 mysql 关键字,找到面板,拷贝 id 7362,而后导入报存。

查看面板数据

设置,抉择 Browse

能够看到增加的 2 个模板

点击后能够看到残缺数据

以上部署后曾经有了展现模块,咱们配置下告警模块

四、alertmanager

alertmanager 是普米的告警模块,可配置各种告警规定并将告警内容发送到微信、钉钉、邮箱等。

4.1 配置 alertmanager 服务

# 1. 下载
wget https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz 

# 2. 解压并拷贝文件
tar xvpf alertmanager-0.17.0.linux-amd64.tar.gz
ln -s /usr/local/alertmanager-0.17.0.linux-amd64 /usr/local/alertmanager

# 3. 创立数据目录并赋权
mkdir -p /data/alertmanager
chown prometheus:prometheus /data/alertmanager -R
chown prometheus:prometheus /usr/local/alertmanager -R

# 4. 配置启动脚本
echo '
[Unit]
Description=alertmanager
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/alertmanager/alertmanager \
          --config.file=/usr/local/alertmanager/alertmanager.yml \
          --storage.path=/data/alertmanager \
          --data.retention=120h
Restart=on-failure
RestartSec=60s
[Install]
WantedBy=multi-user.target
' > /usr/lib/systemd/system/alertmanager.service

# 5. 启动
systemctl enable alertmanager.service
systemctl start alertmanager.service

# 6. 查看开启状况
[root@mgr2 alertmanager]# netstat -nltp|grep alertmanager
tcp6       0      0 :::9093                 :::*                    LISTEN      30369/alertmanager
tcp6       0      0 :::9094                 :::*                    LISTEN      30369/alertmanager

4.2 配置 dingding 告警

4.2.1 创立 dingding 告警机器人

1. 钉钉创立一个群组,取名 告警

2. 点击右上角的 设置

3. 点击 智能群助手

4. 增加机器人

5. 点击设置

6. 抉择自定义

7. 点击 增加

8. 设置下 机器人 平安设置 , 点击实现

9. 最初确认信息,点击实现

10. 设置后,群音讯会弹出欢送音讯

4.2.2 装置钉钉告警插件
# 1. 下载
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.0.0/prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz

# 2. 解压
tar xvpf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz
mv prometheus-webhook-dingtalk-2.0.0.linux-amd64 /usr/local/
ln -s /usr/local/prometheus-webhook-dingtalk-2.0.0.linux-amd64 /usr/local/prometheus-webhook-dingtalk

# 3. 配置 config.yml
# 拷贝个模板文件 
# url 和 secret 是咱们创立告警机器人的时候呈现的 webook 和平安设置的 "加签"

cp config.example.yml config.yml

[root@mgr2 prometheus-webhook-dingtalk]# cat config.yml
templates:
  - contrib/templates/legacy/template.tmpl

targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    secret: SEC000000000000000000000

# 4. 配置启动服务
echo '
[Unit]
Description=prometheus-webhook-dingtalk
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk  \
          --config.file='/usr/local/prometheus-webhook-dingtalk/config.yml'
Restart=on-failure
RestartSec=60s
[Install]
WantedBy=multi-user.target
' > /usr/lib/systemd/system/prometheus-webhook-dingtalk.service

# 5. 赋权
chown prometheus.prometheus /usr/local/prometheus-webhook-dingtalk -R

# 6. 启动
systemctl enable prometheus-webhook-dingtalk.service
systemctl start prometheus-webhook-dingtalk.service

# 7. 确认开启了
[root@mgr2 prometheus-webhook-dingtalk]# ps -aux|grep prometheus-webhook-dingtalk
prometh+ 23162  0.0  0.3 716116  5768 ?        Ssl  15:23   0:00 /usr/local/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/usr/local/prometheus-webhook-dingtalk/config.yml              :::*                    LISTEN      15654/node_exporter

以上,钉钉告警模块就实现了

4.3 配置 rule

prometheus.yml 有个rule_files 模块会加载咱们的自定义配置信息

# 1. 创立目录
mkdir -p /usr/local/prometheus/rules.d/

# 2. 配置告警规定信息
[root@mgr2 rules.d]# cat test.rules
groups:
  - name: OsStatsAlert
    rules:
    - alert: Out of Disk Space
      expr: (1 - (node_filesystem_avail_bytes{fstype=~"ext[34]|xfs"} / node_filesystem_size_bytes{fstype=~"ext[234]|btrfs|xfs|zfs"}) ) * 100 > 15
      for: 1m
      labels:
        team: node
      annotations:
        summary: "{{$labels.instance}}: 文件系统空间使用率过高"
        description: "{{$labels.instance}}: 文件系统空间使用率超过 15% (以后使用率: {{ $value}})"

  - name: MySQLStatsAlert
    rules:
    - alert: MySQL is down
      expr: mysql_up == 0
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Instance {{$labels.instance}} MySQL is down"
        description: "MySQL database is down."

# 3. 重启
systemctl restart prometheus
systemctl restart alertmanager

4. 敞开 MySQL 过程,察看告警信息

systemctl stop greatsql@mgr3306.service

5. 提醒告警信息

6. 异样复原后也会进行告警告诉

五、总结

以上基于 Prometheus+Grafana+ 钉钉简略部署了一个告警零碎,能够结合实际状况自行进行扩大,在生产上 Prometheus 个别采纳集群形式,避免单点故障,同时也可与 consul 联合做服务主动发现,缩小手动配置环节。

Enjoy GreatSQL :)

文章举荐:

面向金融级利用的 GreatSQL 正式开源
https://mp.weixin.qq.com/s/cI…

Changes in GreatSQL 8.0.25 (2021-8-18)
https://mp.weixin.qq.com/s/qc…

MGR 及 GreatSQL 资源汇总
https://mp.weixin.qq.com/s/qX…

GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6…

在 Linux 下源码编译装置 GreatSQL/MySQL
https://mp.weixin.qq.com/s/WZ…

# 对于 GreatSQL

GreatSQL 是由万里数据库保护的 MySQL 分支,专一于晋升 MGR 可靠性及性能,反对 InnoDB 并行查问个性,是实用于金融级利用的 MySQL 分支版本。

Gitee:

https://gitee.com/GreatSQL/Gr…

GitHub:

https://github.com/GreatSQL/G…

Bilibili:

https://space.bilibili.com/13…

微信 &QQ 群:

可搜寻增加 GreatSQL 社区助手微信好友,发送验证信息“加群”退出 GreatSQL/MGR 交换微信群

QQ 群:533341697

微信小助手:wanlidbc

本文由博客一文多发平台 OpenWrite 公布!

退出移动版