• GreatSQL社区原创内容未经受权不得随便应用,转载请分割小编并注明起源。
  • GreatSQL是MySQL的国产分支版本,应用上与MySQL统一。

[toc]

一、Prometheus

# 1.下载wget https://github.com/prometheus/prometheus/releases/download/v2.35.0/prometheus-2.35.0.linux-amd64.tar.gz# 2.解压tar xvpf prometheus-2.35.0.linux-amd64.tar.gz -C /usr/local# 3.建软链ln -s /usr/local/prometheus-2.35.0.linux-amd64 /usr/local/prometheus# 4.建用户和目录并开权限groupadd prometheususeradd prometheus -g prometheus -s /sbin/nologinmkdir -p /data/prometheuschown prometheus.prometheus /data/prometheus -Rchown prometheus.prometheus /usr/local/prometheus/ -R# 5.配置启动echo '[Unit]Description=prometheusAfter=network.target[Service]Type=simpleUser=prometheusExecStart=/usr/local/prometheus/prometheus \          --config.file=/usr/local/prometheus/prometheus.yml \          --storage.tsdb.path=/data/prometheus \          --web.console.templates=/usr/local/prometheus/consoles \          --web.console.libraries=/usr/local/prometheus/console_librariesExecReload=/bin/kill -HUP $MAINPIDRestart=on-failureRestartSec=60s[Install]WantedBy=multi-user.target' > /usr/lib/systemd/system/prometheus.service# 6.增加prometheus.yml配置echo 'global:  scrape_interval: 15s  scrape_timeout: 10s  evaluation_interval: 15salerting:  alertmanagers:  - static_configs:    - targets:      - localhost:9093    scheme: http    timeout: 10srule_files:- /usr/local/prometheus/rules.d/*.rulesscrape_configs:- job_name: prometheus  honor_timestamps: true  scrape_interval: 5s  scrape_timeout: 5s  metrics_path: /metrics  scheme: http  static_configs:  - targets:    - localhost:9090- job_name: node-exporter  honor_timestamps: true  scrape_interval: 5s  scrape_timeout: 5s  metrics_path: /metrics  scheme: http  static_configs:  - targets:    - localhost:9100- job_name: mysqld-exporter  honor_timestamps: true  scrape_interval: 5s  scrape_timeout: 5s  metrics_path: /metrics  scheme: http  static_configs:  - targets:    - localhost:9104' > /usr/local/prometheus/prometheus.yml# 7.启动systemctl enable prometheus.servicesystemctl start prometheus.service# 8.确认开启[root@mgr2 prometheus]# netstat -nltp|grep prometheustcp6       0      0 :::9090                 :::*                    LISTEN      11028/prometheus

9.浏览器拜访呈现prometheus的治理后盾

http://192.168.6.216:9090

以上单机版的prometheus服务端就部署实现了,接下来咱们部署下node_exportmysqld_export 来采集零碎和 MySQL 的监控数据。

二、exporter

exporter 是客户端采集模块,除了零碎模块 node_exporter 之外,每个利用都有本人相应的模块,比方 MySQL 的 mysqld_exporter

建设一个 exporter 对立治理目录

mkdir -p /usr/local/prometheus_exporterchown prometheus.prometheus /usr/local/prometheus_exporter -R

2.1 node_exporter

用来监控零碎指标的 exporter 包含内存、CPU、磁盘空间、磁盘IO、网络等一系列指标数据。

# 1.下载解压wget https://github.com/prometheus/node_exporter/releases/download/v0.18.0/node_exporter-0.18.0.linux-amd64.tar.gztar xvpf node_exporter-0.18.0.linux-amd64.tar.gzcd node_exporter-0.18.0.linux-amd64/mv node_exporter /usr/local/prometheus_exporter/chown prometheus:prometheus /usr/local/prometheus_exporter/ -R# 2.配置启动服务echo '[Unit]Description=node_exporterAfter=network.target[Service]Type=simpleUser=prometheusExecStart=/usr/local/prometheus_exporter/node_exporterRestart=on-failureRestartSec=60s[Install]WantedBy=multi-user.target' > /usr/lib/systemd/system/node_exporter.service# 5.启动systemctl enable node_exporter.servicesystemctl start node_exporter.service# 6.确认开启[root@mgr2 node_exporter]# netstat -nltp|grep node_exportertcp6       0      0 :::9100                 :::*                    LISTEN      15654/node_exporter# 7.确认采集到数据[root@mgr2 prometheus]# curl http://192.168.6.216:9100/metrics# TYPE node_cpu_seconds_total counternode_cpu_seconds_total{cpu="0",mode="idle"} 273849.94node_cpu_seconds_total{cpu="0",mode="iowait"} 607.22node_cpu_seconds_total{cpu="0",mode="irq"} 0node_cpu_seconds_total{cpu="0",mode="nice"} 84.82node_cpu_seconds_total{cpu="0",mode="softirq"} 3.35node_cpu_seconds_total{cpu="0",mode="steal"} 0node_cpu_seconds_total{cpu="0",mode="system"} 5026.1node_cpu_seconds_total{cpu="0",mode="user"} 3723.54# HELP node_disk_io_now The number of I/Os currently in progress.# TYPE node_disk_io_now gaugenode_disk_io_now{device="dm-0"} 0node_disk_io_now{device="dm-1"} 0

2.2 mysqld_exporter

监控 MySQL 的 exporter ,包含连接数、同步状态,InnoDB状态、响应状态等。

# 下载解压wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.11.0/mysqld_exporter-0.11.0.linux-amd64.tar.gztar xvpf mysqld_exporter-0.11.0.linux-amd64.tar.gzcd mysqld_exporter-0.11.0.linux-amd64mv mysqld_exporter /usr/local/prometheus_exporter/chown prometheus:prometheus /usr/local/prometheus_exporter/ -R# 3.创立监控用的账户权限,数据库是8.0版本CREATE USER 'mysqlmonitor'@'127.0.0.1' IDENTIFIED BY 'mc.2022' WITH MAX_USER_CONNECTIONS 3;GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqlmonitor'@'127.0.0.1';ALTER USER 'mysqlmonitor'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'mc.2022';flush privileges;# 4.配置启动服务vi /usr/lib/systemd/system/mysqld_exporter.service[Unit]Description=mysqld_exporterAfter=network.target[Service]Type=simpleUser=prometheusEnvironment='DATA_SOURCE_NAME=mysqlmonitor:mc.2022@tcp(127.0.0.1:3306)'ExecStart=/usr/local/prometheus_exporter/mysqld_exporter \          --config.my-cnf='/data/GreatSQL/my.cnf' \          --collect.engine_innodb_status \          --collect.slave_status \          --web.listen-address=:9104 \          --web.telemetry-path=/metricsRestart=on-failureRestartSec=60s[Install]WantedBy=multi-user.target# 5.启动systemctl enable mysqld_exporter.servicesystemctl start mysqld_exporter.service# 6.确认开启[root@mgr2 prometheus]# netstat -nltp|grep mysqld_exporttcp6       0      0 :::9104                 :::*                    LISTEN      14639/mysqld_export# 7.确认采集到数据[root@mgr2 prometheus]# curl http://192.168.6.216:9104/metrics# TYPE mysql_up gaugemysql_up 1......

三、grafana

通过 grafana 咱们能够将采集到的数据通过可视化的形式展示进去,对采集的数据进行展现和分类等。

grafana 的数据源既能够是 prometheus 也能够是zabbix、ES等、这是一个提供多种数据接口的数据展现软件。

3.1 部署

# 1.通过rpm装置wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.3-1.x86_64.rpmyum install grafana-enterprise-8.5.3-1.x86_64.rpm# 2.备份先原来的配置mv /etc/grafana/grafana.ini /etc/grafana/grafana.ini.`date +"%Y-%m-%d"`.bak# 3.创立目录mkdir -p /data/grafanamkdir -p /data/logs/grafanamkdir -p /usr/local/grafana/pluginschown grafana.grafana /data/grafanachown grafana.grafana /data/logs/grafanachown grafana.grafana /usr/local/grafana/plugins # 4.拷贝模板文件进行替换echo 'app_mode = production[paths]data = /data/grafanatemp_data_lifetime = 24hlogs = /data/logs/grafanaplugins = /usr/local/grafana/plugins[server]protocol = httphttp_port = 3000domain = gkhtroot_url = http://192.168.6.216:3000enable_gzip = true[database]log_queries =[remote_cache][session]provider = file[dataproxy][analytics]reporting_enabled = falsecheck_for_updates = false[security]admin_user = adminadmin_password = adminsecret_key = SW2YcwTIb9zpOOhoPsMm[snapshots][dashboards]versions_to_keep = 10[users]default_theme = dark[auth][auth.anonymous]enabled = trueorg_role = Viewer[auth.github][auth.google][auth.generic_oauth][auth.grafana_com][auth.proxy][auth.basic][auth.ldap][smtp][emails][log]mode = console filelevel = info[log.console][log.file]log_rotate = truedaily_rotate = truemax_days = 30[log.syslog][alerting]enabled = trueexecute_alerts = true[explore][metrics]enabled = trueinterval_seconds = 10[metrics.graphite][tracing.jaeger][grafana_com]url = https://grafana.com[external_image_storage][external_image_storage.s3][external_image_storage.webdav][external_image_storage.gcs][external_image_storage.azure_blob][external_image_storage.local][rendering][enterprise][panels]' > /etc/grafana/grafana.inichown grafana.grafana /etc/grafana/grafana.ini# 5.开启systemctl enable grafana-server.servicesystemctl start grafana-server.service# 6.查看开启状态[root@mgr2 opt]# netstat -nltp|grep grafanatcp6       0      0 :::3000                 :::*                    LISTEN      23647/grafana-serve

7.浏览器拜访

http://192.168.6.216:3000/login

账户明码都是 admin 登陆后先改下管理员明码,这里演示就跳过,上面是主界面

3.2 配置数据源

1.设置,Data sources

2.Add data source

3.输出Prometheus

4.增加数据源信息

5.测试连贯

6.返回数据源

3.3 配置监控模板

模板库:https://grafana.com/dashboards

node_exporter面板

抉择数据源 Prometheus ,输出 exporter,抉择活跃度高的。

点击面板后,查看右侧id值

1.抉择导入 Import

2.输出id值,而后load

3.输出名称,抉择数据源,点击导入

4.查看监控数据

5.保留面板

mysqld_exporter面板

还是像之前步骤一样,搜寻 mysql 关键字,找到面板,拷贝id 7362,而后导入报存。

查看面板数据

设置,抉择 Browse

能够看到增加的2个模板

点击后能够看到残缺数据

以上部署后曾经有了展现模块,咱们配置下告警模块

四、alertmanager

alertmanager是普米的告警模块,可配置各种告警规定并将告警内容发送到微信、钉钉、邮箱等。

4.1 配置alertmanager服务

# 1.下载wget https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz # 2.解压并拷贝文件tar xvpf alertmanager-0.17.0.linux-amd64.tar.gzln -s /usr/local/alertmanager-0.17.0.linux-amd64 /usr/local/alertmanager# 3.创立数据目录并赋权mkdir -p /data/alertmanagerchown prometheus:prometheus /data/alertmanager -Rchown prometheus:prometheus /usr/local/alertmanager -R# 4.配置启动脚本echo '[Unit]Description=alertmanagerAfter=network.target[Service]Type=simpleUser=prometheusExecStart=/usr/local/alertmanager/alertmanager \          --config.file=/usr/local/alertmanager/alertmanager.yml \          --storage.path=/data/alertmanager \          --data.retention=120hRestart=on-failureRestartSec=60s[Install]WantedBy=multi-user.target' > /usr/lib/systemd/system/alertmanager.service# 5.启动systemctl enable alertmanager.servicesystemctl start alertmanager.service# 6.查看开启状况[root@mgr2 alertmanager]# netstat -nltp|grep alertmanagertcp6       0      0 :::9093                 :::*                    LISTEN      30369/alertmanagertcp6       0      0 :::9094                 :::*                    LISTEN      30369/alertmanager

4.2 配置dingding告警

4.2.1 创立dingding告警机器人

1.钉钉创立一个群组,取名 告警

2.点击右上角的 设置

3.点击 智能群助手

4.增加机器人

5.点击设置

6.抉择自定义

7.点击 增加

8.设置下 机器人平安设置 ,点击实现

9.最初确认信息,点击实现

10.设置后,群音讯会弹出欢送音讯

4.2.2 装置钉钉告警插件
# 1.下载wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.0.0/prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gz# 2.解压tar xvpf prometheus-webhook-dingtalk-2.0.0.linux-amd64.tar.gzmv prometheus-webhook-dingtalk-2.0.0.linux-amd64 /usr/local/ln -s /usr/local/prometheus-webhook-dingtalk-2.0.0.linux-amd64 /usr/local/prometheus-webhook-dingtalk# 3.配置config.yml# 拷贝个模板文件 # url 和 secret 是咱们创立告警机器人的时候呈现的 webook 和平安设置的"加签"cp config.example.yml config.yml[root@mgr2 prometheus-webhook-dingtalk]# cat config.ymltemplates:  - contrib/templates/legacy/template.tmpltargets:  webhook1:    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx    secret: SEC000000000000000000000# 4.配置启动服务echo '[Unit]Description=prometheus-webhook-dingtalkAfter=network.target[Service]Type=simpleUser=prometheusExecStart=/usr/local/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk  \          --config.file='/usr/local/prometheus-webhook-dingtalk/config.yml'Restart=on-failureRestartSec=60s[Install]WantedBy=multi-user.target' > /usr/lib/systemd/system/prometheus-webhook-dingtalk.service# 5.赋权chown prometheus.prometheus /usr/local/prometheus-webhook-dingtalk -R# 6.启动systemctl enable prometheus-webhook-dingtalk.servicesystemctl start prometheus-webhook-dingtalk.service# 7.确认开启了[root@mgr2 prometheus-webhook-dingtalk]# ps -aux|grep prometheus-webhook-dingtalkprometh+ 23162  0.0  0.3 716116  5768 ?        Ssl  15:23   0:00 /usr/local/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/usr/local/prometheus-webhook-dingtalk/config.yml              :::*                    LISTEN      15654/node_exporter

以上,钉钉告警模块就实现了

4.3 配置 rule

prometheus.yml 有个rule_files 模块会加载咱们的自定义配置信息

# 1.创立目录mkdir -p /usr/local/prometheus/rules.d/# 2.配置告警规定信息[root@mgr2 rules.d]# cat test.rulesgroups:  - name: OsStatsAlert    rules:    - alert: Out of Disk Space      expr: ( 1 - (node_filesystem_avail_bytes{fstype=~"ext[34]|xfs"} / node_filesystem_size_bytes{fstype=~"ext[234]|btrfs|xfs|zfs"}) ) * 100 > 15      for: 1m      labels:        team: node      annotations:        summary: "{{$labels.instance}}: 文件系统空间使用率过高"        description: "{{$labels.instance}}: 文件系统空间使用率超过 15% (以后使用率: {{ $value }})"  - name: MySQLStatsAlert    rules:    - alert: MySQL is down      expr: mysql_up == 0      for: 1m      labels:        severity: critical      annotations:        summary: "Instance {{ $labels.instance }} MySQL is down"        description: "MySQL database is down."# 3.重启systemctl restart prometheussystemctl restart alertmanager

4.敞开MySQL过程,察看告警信息

systemctl stop greatsql@mgr3306.service

5.提醒告警信息

6.异样复原后也会进行告警告诉

五、总结

以上基于Prometheus+Grafana+钉钉简略部署了一个告警零碎,能够结合实际状况自行进行扩大,在生产上Prometheus个别采纳集群形式,避免单点故障,同时也可与consul联合做服务主动发现,缩小手动配置环节。

Enjoy GreatSQL :)

文章举荐:

面向金融级利用的GreatSQL正式开源
https://mp.weixin.qq.com/s/cI...

Changes in GreatSQL 8.0.25 (2021-8-18)
https://mp.weixin.qq.com/s/qc...

MGR及GreatSQL资源汇总
https://mp.weixin.qq.com/s/qX...

GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6...

在Linux下源码编译装置GreatSQL/MySQL
https://mp.weixin.qq.com/s/WZ...

# 对于 GreatSQL

GreatSQL是由万里数据库保护的MySQL分支,专一于晋升MGR可靠性及性能,反对InnoDB并行查问个性,是实用于金融级利用的MySQL分支版本。

Gitee:

https://gitee.com/GreatSQL/Gr...

GitHub:

https://github.com/GreatSQL/G...

Bilibili:

https://space.bilibili.com/13...

微信&QQ群:

可搜寻增加GreatSQL社区助手微信好友,发送验证信息“加群”退出GreatSQL/MGR交换微信群

QQ群:533341697

微信小助手:wanlidbc

本文由博客一文多发平台 OpenWrite 公布!