Grafana

关于grafana:Grafana-10-新特性解读体验与协作全面提升

为了庆贺 Grafana 的 10 年里程碑，Grafana Labs 推出了 Grafana 10，这个具备留念意义的版本强调加强用户体验，使各种开发人员更容易应用。Grafana v10.0.x 为开发者与企业展现卓越的新性能、可视化与合作能力，其中还包含：更新 Panel 面板更新 Dashboard更新导航栏更新 Grafana Altering本文仅介绍了 Grafana v10.0.x 更新的局部性能个性，更多详细信息，请参见 Grafana 官网文档[1]。那么，明天咱们带大家一一解读，Grafana 10 所带来的各种新个性与新能力。 01 新的 Panel 面板（1）XY Chart 面板Grafana v10.0.x 反对新的 x-y 图表面板，蕴含折线图和散点图。（2）XY Trend 面板Grafana v10.0.x 新增 xy 趋势图，趋势图容许您展现 x 轴为数值（x 须要递增）而非工夫的趋势。此面板解决了工夫序列（Time Series）或 XY 图表面板（XY Chart）均无奈解决的问题。例如，能够绘制函数图、rpm/ 扭矩曲线、供需关系等。（3）DataGrid 面板Grafana v10.0.x 新增 DataGrid 面板，反对在 Grafana 仪表板中编辑数据来自定义数据，您能够用于微调从数据源读取的数据或者用于创立新的数据。批改后的数据以快照的模式保留，不随工夫更新。保留 DataGrid 之后，在新的 panel 里抉择 dashboard 数据源，能够用微调后的数据作为数据源，并反对对 DataGrid 数据进行 transform。 ...

关于grafana:Grafana部署方案

筹备工作1、创立用户和配置环境参数(1)、创立用户和创立所需目录[root@centos ~]# groupadd prometheus[root@centos ~]# useradd -d /home/prometheus -g prometheus -m prometheus[root@centos ~]# chmod 755 /home/prometheus[root@centos ~]# mkdir -p /home/prometheus/software[root@centos ~]# mkdir -p /home/prometheus/yunwei[root@centos ~]# chown -R prometheus:prometheus /home/prometheus[root@centos ~]# mkdir -p /data/grafana[root@centos ~]# chown -R prometheus:prometheus /data/grafana(2)、下载https://dl.grafana.com/oss/release/grafana-9.3.6.linux-amd64....部署计划1、部署(1)、敞开防火墙[root@centos ~]# systemctl stop firewalld[root@centos ~]# systemctl disable firewalldRemoved symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.[root@centos ~]# setenforce 0[root@centos ~]# getenforce Permissive[root@centos ~]# vi /etc/sysconfig/selinux # This file controls the state of SELinux on the system.# SELINUX= can take one of these three values:# enforcing - SELinux security policy is enforced.# permissive - SELinux prints warnings instead of enforcing.# disabled - No SELinux policy is loaded.SELINUX=disabled# SELINUXTYPE= can take one of three values:# targeted - Targeted processes are protected,# minimum - Modification of targeted policy. Only selected processes are protected.# mls - Multi Level Security protection.SELINUXTYPE=targeted(2)、解压安装包并备份配置文件[prometheus@centos ~]$ tar zxf $HOME/software/grafana-9.3.6.linux-amd64.tar.gz -C $HOME[prometheus@centos ~]$ cp $HOME/grafana-9.3.6/conf/defaults.ini $HOME/grafana-9.3.6/conf/grafana.ini(3)、创立所需目录[prometheus@centos ~]$ mkdir -p /data/grafana/grafana-9.3.6/logs[prometheus@centos ~]$ mkdir -p /data/grafana/grafana-9.3.6/data/plugins2、调整配置文件[prometheus@centos ~]$ vi $HOME/grafana-9.3.6/conf/grafana.ini# 批改配置地位如下#################################### Paths ###############################[paths]# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)data = /data/grafana/grafana-9.3.6/data# Directory where grafana can store logslogs = /data/grafana/grafana-9.3.6/logs# Directory where grafana will automatically scan and look for pluginsplugins = /data/grafana/grafana-9.3.6/data/plugins#################################### Server ##############################[server]# The http port to usehttp_port = 8666# the path relative working pathstatic_root_path = public#################################### Security ############################[security]# default admin user, created on startupadmin_user = admin# default admin password, can be changed before first start of grafana, or in profile settingsadmin_password = "*******"3、启停服务与创立脚本(1)、查看grafana版本信息[prometheus@centos ~]$ $HOME/grafana-9.3.6/bin/grafana-server -vVersion 9.3.6 (commit: 978237e7cb, branch: HEAD)(2)、启动脚本配置[prometheus@centos ~]$ vi $HOME/yunwei/grafana-9.3.6_start.sh#!/bin/bash#cd $HOME/grafana-9.3.6/bin./grafana-server -config=/home/prometheus/grafana-9.3.6/conf/grafana.ini 2&>1 &(3)、进行脚本配置[prometheus@centos ~]$ vi $HOME/yunwei/grafana-9.3.6_stop.sh#!/bin/bash#grafa_pid=`ps -ef|grep grafana-server|grep grafana.ini|awk '{print $2}'`kill -9 $grafa_pid

关于grafana:Grafana-系列文章二使用-Grafana-Agent-和-Grafana-Tempo-进行-Tracing

️URL: https://grafana.com/blog/2020... ✍Author: Robert Fratto • 17 Nov 2020 Description: Here's your starter guide to configuring the Grafana Agent to collect traces and ship them to Tempo, our new distributed tracing system. 编者注：代码片段已于 2021-06-23 更新。早在 3 月份，咱们介绍了 Grafana Agent，这是 Prometheus 的一个子集，为托管指标而建。它应用了很多与 Prometheus 雷同的通过实战测验的代码，能够节俭 40%的内存应用。自推出以来，咱们始终在为 Agent 增加性能。当初，新增性能有：集群机制，额定的 Prometheus exporters，以及对 Loki 的反对。咱们的最新性能。Grafana Tempo! 这是一个易于操作、规模大、成本低的分布式追踪零碎。在这篇文章中，咱们将探讨如何配置 Agent 来收集跟踪，并将其发送到 Tempo。配置 Tempo 反对在你现有的 Agent 配置文件中增加 trace 反对很简略。你所须要做的就是增加一个tempo 块。相熟 OpenTelemetry Collector 的人可能会认出以下代码块中的一些设置。 ...

关于grafana:不背锅运维Grafana的自动登入Go和Python分别实现

1. 实现目标想要达到的指标是：当在浏览器向http://192.168.11.254:3090/au...这个地址发动GET申请后可能主动登入Grafana2. 实现思路须要额定开发一个API解决来自用户的登录申请，实现思路次要有2点：通过代码登录grafana，失去cookie携带这个cookie做重定向须要留神的中央：为了缩小麻烦，这个API程序须要和grafana服务在同一台机器上跑起来，不然会有跨域的问题，跨域的话就不好携带这个cookie了，也不是不能实现，而是解决起来还是比拟麻烦。3. 实现剖析剖析cookie应用非法的账号密码手动登录胜利后，服务端会向浏览器写入cookie，key是grafana_session，看下图：剖析登录表单给到后端的明码字段是user 给到后端的明码字段是password 解决认证的path是/login（其实在地址栏就能够看到，但为了进一步确认还是要剖析一下）该晓得的都晓得了，上面开始写代码实现这个解决登录申请的API，分享用go和python的实现4. go的实现package mainimport ( "io/ioutil" "log" "net/http" "strings")const login_url = "http://192.168.11.254:3000/login"const home_url = "http://192.168.11.254:3000/"// 应用admin账号登陆获取cookie，我这里的明码是1qaz#EDCfunc GetSession(url string) string { method := "POST" payload := strings.NewReader(`{` + " " + ` "user": "admin",` + " " + ` "password": "1qaz#EDC"` + " " + `}`) client := &http.Client{} req, err := http.NewRequest(method, url, payload) if err != nil { log.Println(err) } req.Header.Add("Content-Type", "application/json") res, err := client.Do(req) if err != nil { log.Println(err) } defer res.Body.Close() body, err := ioutil.ReadAll(res.Body) if err != nil { log.Println(err) } log.Println(string(body)) cookie := res.Cookies()[0].Value return cookie}// 处理函数func AutoLogin(w http.ResponseWriter, r *http.Request) { session := GetSession(login_url) if r.Method == "GET" { // 向浏览器写cookie cookie := http.Cookie{ Name: "grafana_session", Value: session, } http.SetCookie(w, &cookie) // 重定向 http.Redirect(w, r, home_url, http.StatusMovedPermanently) }}// 拉起http服务和做路由func Api() { http.HandleFunc("/auto_login", AutoLogin) err := http.ListenAndServe(":3080", nil) if err != nil { log.Println("ListenAndserve:", err) }}func main() { Api()}5. python的实现import jsonimport requestsfrom flask import Flask, request, redirect, make_responseapp = Flask(__name__)login_url = "http://192.168.11.254:3000/login"home_url = "http://192.168.11.254:3000/"def get_session(): payload = json.dumps({ "user": "admin", "password": "1qaz#EDC" }) headers = { 'Content-Type': 'application/json' } response = requests.request("POST", login_url, headers=headers, data=payload) cookie = response.cookies.items()[0][1] return cookie@app.route('/auto_login', methods=['GET'])def auto_login(): if request.method == 'GET': cookie = get_session() response = make_response(redirect(home_url)) response.set_cookie('grafana_session', cookie) return responseif __name__ == "__main__": app.run("0.0.0.0", 3080)6. 测试成果代码写完了，上面测试测试成果，go和python的实现，最终达到的目标是一样的，请别离自行测试哈。在浏览器拜访：http://192.168.11.254:3090/au... 实现主动登录写在最初：在go的实现中，第一次登入后且失常登记，再次通过API登录时，重定向到指标地址时向浏览器写入cookie会失败，导致间接去到登录页面，革除浏览器的历史记录和cookie（次要是清理掉cookie）啥的就能失常进入，这个问题我还在深刻排查。晓得怎么解决的盆友麻烦私聊我，感激不尽。本文转载于（喜爱的盆友关注咱们哦）：https://mp.weixin.qq.com/s/FN...

关于grafana:grafana-docker安装

grafana docker装置Grafana是一款用Go语言开发的开源数据可视化工具，能够做数据监控和数据统计，带有告警性能。目前应用grafana的公司有很多，如paypal、ebay、intel等。 Grafana 是 Graphite 和 InfluxDB 仪表盘和图形编辑器。Grafana 是开源的，功能齐全的度量仪表盘和图形编辑器，反对 Graphite，InfluxDB 和 OpenTSDB等等。特点: 可视化：疾速和灵便的客户端图形具备多种选项。面板插件为许多不同的形式可视化指标和日志。报警：可视化地为最重要的指标定义警报规定。Grafana将继续评估它们，并发送告诉。告诉：警报更改状态时，它会发出通知。接管电子邮件告诉。动静仪表盘：应用模板变量创立动静和可重用的仪表板，这些模板变量作为下拉菜单呈现在仪表板顶部。混合数据源：在同一个图中混合不同的数据源!能够依据每个查问指定数据源。这甚至实用于自定义数据源。正文：正文来自不同数据源图表。将鼠标悬停在事件上能够显示残缺的事件元数据和标记。过滤器：过滤器容许您动态创建新的键/值过滤器，这些过滤器将主动利用于应用该数据源的所有查问。官网文档地址: https://grafana.com/docs/graf... 装置dockerCentos7离线装置Docker 华为云arm架构装置Docker docker运行grafanadocker run -d -p 3000:3000 grafana/grafana或指定版本docker run -d -p 3000:3000 --name grafana grafana/grafana:6.5.0应用root用户连贯docker exec -it --user root grafana /bin/bash能够设置匿名登录嵌入本人零碎iframe展现拜访: http://192.168.0.1:3000/默认用户名和明码：admin/admin

关于grafana:使用grafana看板展示alertmanager静默列表

# AlertGET /api/v2/alertsPOST /api/v2/alerts# AlertGroupGET /api/v2/alerts/groups # GeneralGET /api/v2/status# ReceiverGET /api/v2/receivers# SilenceGET /api/v2/silencesPOST /api/v2/silencesGET /api/v2/silence/{silenceID}DELETE /api/v2/silence/{silenceID}/api/v2/silences import requests, json, time, datetimefrom influxdb import InfluxDBClientDataBasename = "ixxxx"conn_db = InfluxDBClient('10.26.x.xx', '8086', 'litx', 'xxxxx', DataBasename)time_now = time.strftime('%Y-%m-%d %H:%M:%S')def instert_alertmanager_silenced(table, silence_hours, startsAt, endsAt, matchers): silence_hours1 = int(silence_hours) json_body = [ { "measurement": table, "tags": { "startsAt": startsAt, "endsAt": endsAt, "matchers": matchers }, "fields": {"silence_hours": silence_hours1} } ] conn_db.write_points(json_body) # 写入数据，同时创立表def transformation_time(time_initial): time_trans = time_initial.split(".") # print(time_trans[0]) result = datetime.datetime.strptime(time_trans[0], '%Y-%m-%dT%H:%M:%S') result8 = (result + datetime.timedelta(hours=8)).strftime("%Y-%m-%d %H:%M:%S") # print(result, result8) return result8alertmanager_silence_url = 'http://alertmanager.int.xiaxxxshu.com/api/v2/silences'json_alertmanager_silence = requests.get(alertmanager_silence_url)list_alertmanager_silence = json_alertmanager_silence.json()#print(list_alertmanager_silence)silence_activce_list = []for i in list_alertmanager_silence: if i['status']['state'] == 'active': silence_dic = {} end_time = i['endsAt'] silence_dic['startsAt'] = i['startsAt'] silence_dic['endsAt'] = end_time #print(transformation_time(i['startsAt']), transformation_time(i['endsAt']), time_now) d_end = datetime.datetime.strptime(transformation_time(end_time), '%Y-%m-%d %H:%M:%S') d_now = datetime.datetime.strptime(time_now, '%Y-%m-%d %H:%M:%S') delta = d_end - d_now #print(int(delta.seconds/3600) + delta.days * 24) silence_dic['silence_hours'] = int(delta.seconds/3600) + delta.days * 24 for j in i['matchers']: matchers_list = [] matchers_dic = {} matchers_dic['name'] = j['name'] matchers_dic['value'] = j['value'] matchers_list.append(matchers_dic) silence_dic['matchers'] = matchers_list silence_activce_list.append(silence_dic)print(silence_activce_list)for i in silence_activce_list: instert_alertmanager_silenced(table='alertmanager_silenced', silence_hours=i['silence_hours'], startsAt=i['startsAt'], endsAt=i['endsAt'], matchers=i['matchers']) ...

关于grafana:grafana-连接influxdb-报错-not-an-object

grafana 连贯influxdb 报错 can't assign to property "executedQueryString" on "hello world! ": not an object 不要遗记加 8086

关于grafana:Grafana-看板权限控制

创立组增加权限删除原来的viewer 组

关于grafana:grafana-添加-webhook报警

关于grafana:Grafana基于CentOS-7-安装部署Grafana服务端

一、参考链接阿里巴巴开源镜像站-OPSX镜像站-阿里云开发者社区 grafana镜像-grafana下载地址-grafana装置教程-阿里巴巴开源镜像站 centos镜像-centos下载地址-centos装置教程-阿里巴巴开源镜像站 Grafana有什么作用_Grafana下载_Grafana特点二、Grafana介绍什么是GrafanaGrafana是一个可视化面板（Dashboard），有着十分丑陋的图表和布局展现，功能齐全的度量仪表盘和图形编辑器。反对Graphite、zabbix、InfluxDB、Prometheus和OpenTSDB作为数据源。Grafana特点1、grafana领有疾速灵便的客户端图表，面板插件有许多不同形式的可视化指标和日志，官网库中具备丰盛的仪表盘插件，比方热图、折线图、图表等多种展现形式，让咱们简单的数据展现的好看而优雅。2、Grafana反对许多不同的工夫序列数据（数据源）存储后端。每个数据源都有一个特定查问编辑器。官网反对以下数据源：Graphite、infloxdb、opensdb、prometheus、elasticsearch、cloudwatch。每个数据源的查询语言和性能显著不同。你能够将来自多个数据源的数据组合到一个仪表板上，但每个面板都要绑定到属于特定组织的特定数据源3、Grafana中的警报容许您将规定附加到仪表板面板上。保留仪表板时，Gravana会将警报规定提取到独自的警报规定存储中，并安顿它们进行评估。报警音讯还能通过钉钉、邮箱等推送至挪动端。但目前grafana只反对graph面板的报警。4、Grafana应用来自不同数据源的丰盛事件正文图表，将鼠标悬停在事件上会显示残缺的事件元数据和标记；5、Grafana应用Ad-hoc过滤器容许动态创建新的键/值过滤器，这些过滤器会主动利用于应用该数据源的所有查问。三、Grafana装置步骤<font color=red>本试验基于CentOS 7.9搭建部署</font>1、根底环境配置批改主机名 [root@localhost ~]# hostnamectl set-hostname grafana[root@localhost ~]# bash[root@grafana ~]# hostnamectl Static hostname: grafana Icon name: computer-vm Chassis: vm Machine ID: db3692199b194e6b9ac9f92ef24f9c6e Boot ID: 56bd71938e91499ca3106ce091c032ef Virtualization: vmware Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-1160.el7.x86_64 Architecture: x86-64敞开防火墙和SElinux平安模式 systemctl stop firewalld systemctl disable firewalld[root@grafana ~]# setenforce 0[root@grafana ~]# getenforcePermissive配置网卡信息 [root@grafana ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens32[root@grafana ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens32TYPE=EthernetPROXY_METHOD=noneBROWSER_ONLY=noBOOTPROTO=staticDEFROUTE=yesIPV4_FAILURE_FATAL=noIPV6INIT=yesIPV6_AUTOCONF=yesIPV6_DEFROUTE=yesIPV6_FAILURE_FATAL=noIPV6_ADDR_GEN_MODE=stable-privacyNAME=ens32DEVICE=ens32ONBOOT=yesIPADDR=192.168.200.100PREFIX=24GATEWAY=192.168.200.1DNS1=114.114.114.114DNS2=192.168.200.1配置阿里云CentOS镜像源 [root@grafana yum.repos.d]# curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 2523 100 2523 0 0 14587 0 --:--:-- --:--:-- --:--:-- 14668[root@grafana yum.repos.d]# lltotal 4drwxr-xr-x. 2 root root 220 Feb 11 12:27 bak-rw-r--r--. 1 root root 2523 Feb 11 12:27 CentOS-Base.repo[root@grafana yum.repos.d]# yum clean all[root@grafana yum.repos.d]# yum makecache[root@grafana yum.repos.d]# yum repolist更新CentOS零碎 ...

关于grafana:Grafana-k6-上手实践

大家好，我是张晋涛。本篇我将为你介绍一个工具 - k6 ，它和 K8s 并没有什么间接的关系，它是一款开源的性能压测工具。 k6 背地的故事2016 年 8 月，k6 在 GitHub 上公布了第一个版本，至此，一个杰出的开源负载压测工具进入了人们的视线。 2021 年的 6 月，对于 Grafana 和 k6 来讲是个大日子，Grafana Labs 收买了 k6 。而事实上， Grafana 与 k6 的缘分还要追溯到更早的 2 年前。 2019 年，在进行 Grafana 6.0 的短期令牌刷新行为的压测时，Grafana Labs 进行了一系列的技术选型。因为 Grafana Labs 的大部分后端软件是应用 Go 来实现的，凑巧 k6 满足 OSS 和 Go 需要，并且负载测试是应用 JS 编写（Grafana 前端框架及 UI 都在应用）。这使得 k6 自 Grafana 6.0 版本开始，一直地为 Grafana 开发者及测试者实现追踪 bug 的使命。图 1 ，k6 退出 Grafana Labs ...

关于grafana:Flink-实时-metrics

Flink 实时 metrics 目前咱们的 flink 工作跑在 yarn 集群上，在面对以下问题时常驻实时 job 是否在稳固运行？实时数据的解决能力如何？生产过慢？是否须要申请更多资源晋升生产能力？实时数据品质牢靠？是否有丢数据的危险？实时工作现有的资源是否足够撑持现有的数据量？资源是否闲置节约？尽管 flink web ui 提供了一些监控信息，然而对开发还是不够敌对，所以咱们利用 flink metrics + prometheus + grafana 搭建了一套实时监控看板，有利于收集 flink 工作的实时状态。首先介绍下 Flink MetricMetric Types Counter: 示意收集的数据是依照某个趋势（减少／缩小）始终变动的Gauge: 示意收集的数据是一个刹时的值，与工夫没有关系，能够任意变高变低，往往能够用来记录内存使用率、磁盘使用率等。Histogram: 统计数据的散布状况。Meter:度量一系列事件产生的速率(rate)。Metric Reporters Metrics 信息能够通过 flink-conf.yaml 配置，在 job 启动的时候实时上报到内部零碎上。System Metrics Flink 外部会预约义一些 Metrics 指标信息，蕴含 CPU，Memory， IO，Thread，Network，JVM GarbageCollection 等信息User Defined Metrics 用户能够本人依据本人的业务须要，自定义一些监控指标val counter = getRuntimeContext() .getMetricGroup() .addGroup("MyMetricsKey", "MyMetricsValue") .counter("myCounter")Metric 监控搭建梳理监控指标a. 零碎指标 job 数量的监控常驻 job 数量的监控及时发现 job 运行过程中的重启，失败问题算子音讯解决的 numRecordsIn 和 numRecordsOut ...

关于grafana:基于Grafana实现自定义监控

tx工作的后端开发仔，分享后端技术、机器学习、数据结构与算法、计算机根底、程序员面试等话题。欢送关注公众号“任冬学编程”基于Grafana实现自定义监控前言因为本文内容较细较杂，所以先将文章构造目录放在后面，不便大家理清思路，同时这也是题主第一次写KM文章，如文中存在有余不当之处，心愿大家批评指正，谢谢！ <center>文章构造目录</center> 1、背景1.1、穿插监控监控是整个产品生命周期十分重要的一环，运维关注硬件和根底监控，研发关注各类中间件和应用层的监控，产品关注外围业务指标的监控。对于数据上报、传输、存储、利用全链路进行自监控，能够实现实时采集监控数据、预知故障和告警等，然而如果自监控挂掉了又该怎么发现呢，这就须要引入内部穿插监控，监控自监控的生命周期。 1.2、监控工具选型正所谓「无监控，不运维」，监控零碎的位置显而易见。对于监控零碎的选型和一些监控根底能够参考大佬的文章监控零碎选型。对于题主而言，穿插监控零碎须要做到ES数据源导入、实时界面可视化、触发阈值后及时多形式告警，尽管ELK三剑客的Kibana也能够可视化，然而不太好用QAQ。 Grafana 是一个跨平台的开源可视化工具，通过配置数据源的而形式对数据进行简单语句的查问和展现，反对MySQL、Elasticsearch等多达14种数据源并且大部分数据源反对配置告警。所以最初抉择了操作不便简略的grafana！小结：内部巡检+外部自检，穿插监控能够避免外部监控挂掉无处查问监控数据的问题，加强监控的纵深和层级。监控图表立体化，依据关联性对图表分组和并建设层级关系，监控图表的聚合、分组、立体化对疾速定位问题本源是十分要害的。Grafana自定义监控配置较为不便简略2、实操下面讲到穿插监控工具选型定为grafana，须要实现ES数据源导入、实时界面可视化、触发阈值后多形式告警等需要，上面进入实操阶段。 2.1、配置DataSource次要是对数据源进行相干的设置，生成无效的数据源 <center>抉择数据源类型</center> <center>数据源相干配置</center> 2.2、配置Dashboard在配置好所应用的数据源之后，即可新增配置本人的面板。面板也存在多种： <center>可视化形式</center> 这里选取graph为例，如下图所示，新增或配置仪表盘。右上角的红框中示意：新建、标星、分享、保留、设置、查问模式、时间段、放大（针对时间段进行放宽，即小时间段换成了大时间段）、刷新等 <center>界面总览</center> <center>General界面</center> 2.3、配置Variables这里进行模板变量的设置，次要是为了不便后续界面查问，搭配灵便下拉框的配置形式，进行监控图表的自定义聚合，疾速定位问题。 <center>Variables设置</center> 对于Query语句的设置，参考官网文档的配置形式阐明： Query形容{“find”：“fields”，“type”：“keyword”}返回索引类型的字段名称列表keyword。{“find”：“terms”，“field”：“@ hostname”，“size”：1000}应用术语聚合返回字段的值列表。查问将用户以后仪表板工夫范畴作为查问的工夫范畴。{“find”：“terms”，“field”：“@ hostname”，“query”：'<lucene query="" style="box-sizing: border-box;">“}</lucene>应用term aggregation＆和指定的lucene查问过滤器返回字段的值列表。查问将应用以后仪表板工夫范畴作为查问的工夫范畴。在进行了无效的变量设置后，能够保留查看预览成果 <center>保留Variables设置</center> <center>Variables配置成果预览</center> 留神下面仅仅是实现了变量的配置，在数据查问中下拉框并未起作用，须要应用query语句对变量进行绑定能力失效!! <center>Variables绑定失效</center> 2.4、对接星云告警星云告警管理系统，是一个面向告警的通用治理计划，提供了告警接入了上报、屏蔽、订阅、收敛、复原、查问、告诉（反对电话、微信、企业微信、邮件小程序）、降级、自动化解决、统计分析等治理能力；通过对告警数据的结构化定义，丰盛凋谢的API，零碎具备高度的可定制化能力；同时无缝对接腾讯云星云工单零碎、值班零碎、流程引擎，具备了弱小的自动化解决能力；目前次要服务于腾讯云根底IAAS运维场景，已接入云绝大部分根底告警。星云告警管理系统地址：告警查问 2.4.1、相干配置第一步：查看是否曾经存在星云告警 channel，如果没有则在 Alerting – Notification channels 新增 channel。留神：该 channel 只需在第一次应用该接口时配置，配置一次即可，如果曾经存在星云告警，则间接进行第二步。第二步：配置告警条件，抉择 Panel 页面右边的 Alert 选项，进行告警条件的配置。 <center>模板变量无奈配置告警</center> 留神这里会提醒模板变量无奈配置告警，题主给出两种解决方案：1、多查问计划：下图中A查问配置了模板变量无奈配置告警，能够对新增的查问B不应用模板变量配置query语句，从而进行配置告警。（留神将B视图屏蔽，避免出现视图重叠问题） <center>add Query</center> 2、双视图计划：能够设置双视图，一个用来monitor用来监控，另外一个视图进行Alert告警，其余配置和第一种计划雷同。两种形式区别不是很大，多视图可能更不便直观一点。 <center>双视图配置</center> 解决模板变量问题后，回归正题，进行告警条件的配置 ...

关于grafana:不对全文内容进行索引的Loki到底优秀在哪里可以占据一部分日志监控领域

总结下loki的长处1.低索引开销loki和es最大的不同是 loki只对标签进行索引而不对内容索引这样做能够大幅升高索引资源开销(es无论你查不查，微小的索引开销必须时刻承当)2.并发查问+应用cache同时为了补救没有全文索引带来的查问降速应用，Loki将把查问分解成较小的分片，能够了解为并发的grep同时反对index、chunk和result缓存提速3.和prometheus采纳雷同的标签，对接alertmanagerLoki和Prometheus之间的标签统一是Loki的超级能力之一4.应用grafana作为前端，防止在kibana和grafana来回切换架构阐明地址 https://grafana.com/docs/loki...架构阐明组件阐明promtail 作为采集器，类比filebeatloki相当于服务端，类比esloki 过程蕴含四种角色querier 查询器ingester 日志存储器query-frontend 前置查询器distributor 写入散发器能够通过loki二进制的 -target参数指定运行角色read path查询器接管HTTP / 1数据申请。查询器将查问传递给所有ingesters 申请内存中的数据。接收器接管读取的申请，并返回与查问匹配的数据（如果有）。如果没有接收者返回数据，则查询器会从后备存储中提早加载数据并对其执行查问。查询器将迭代所有接管到的数据并进行反复数据删除，从而通过HTTP / 1连贯返回最终数据集。write path 散发服务器收到一个HTTP / 1申请，以存储流数据。每个流都应用散列环散列。散发程序将每个流发送到适当的inester和其正本（基于配置的复制因子）。每个实例将为流的数据创立一个块或将其追加到现有块中。每个租户和每个标签集的块都是惟一的。散发服务器通过HTTP / 1连贯以胜利代码作为响应。应用本地化模式装置下载promtail和loki二进制wget https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zipwget https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip找一台 linux机器做测试装置promtailmkdir /opt/app/{promtail,loki} -pv # promtail配置文件cat <<EOF> /opt/app/promtail/promtail.yamlserver: http_listen_port: 9080 grpc_listen_port: 0positions: filename: /var/log/positions.yaml # This location needs to be writeable by promtail.client: url: http://localhost:3100/loki/api/v1/pushscrape_configs: - job_name: system pipeline_stages: static_configs: - targets: - localhost labels: job: varlogs # A `job` label is fairly standard in prometheus and useful for linking metrics and logs. host: yourhost # A `host` label will help identify logs from this machine vs others __path__: /var/log/*.log # The path matching uses a third party library: https://github.com/bmatcuk/doublestarEOF# service文件cat <<EOF >/etc/systemd/system/promtail.service[Unit]Description=promtail serverWants=network-online.targetAfter=network-online.target[Service]ExecStart=/opt/app/promtail/promtail -config.file=/opt/app/promtail/promtail.yamlStandardOutput=syslogStandardError=syslogSyslogIdentifier=promtail[Install]WantedBy=default.targetEOFsystemctl daemon-reloadsystemctl restart promtail systemctl status promtail 装置lokimkdir /opt/app/{promtail,loki} -pv # promtail配置文件cat <<EOF> /opt/app/loki/loki.yamlauth_enabled: falseserver: http_listen_port: 3100 grpc_listen_port: 9096ingester: wal: enabled: true dir: /opt/app/loki/wal lifecycler: address: 127.0.0.1 ring: kvstore: store: inmemory replication_factor: 1 final_sleep: 0s chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m) max_transfer_retries: 0 # Chunk transfers disabledschema_config: configs: - from: 2020-10-24 store: boltdb-shipper object_store: filesystem schema: v11 index: prefix: index_ period: 24hstorage_config: boltdb_shipper: active_index_directory: /opt/app/loki/boltdb-shipper-active cache_location: /opt/app/loki/boltdb-shipper-cache cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space shared_store: filesystem filesystem: directory: /opt/app/loki/chunkscompactor: working_directory: /opt/app/loki/boltdb-shipper-compactor shared_store: filesystemlimits_config: reject_old_samples: true reject_old_samples_max_age: 168hchunk_store_config: max_look_back_period: 0stable_manager: retention_deletes_enabled: false retention_period: 0sruler: storage: type: local local: directory: /opt/app/loki/rules rule_path: /opt/app/loki/rules-temp alertmanager_url: http://localhost:9093 ring: kvstore: store: inmemory enable_api: trueEOF# service文件cat <<EOF >/etc/systemd/system/loki.service[Unit]Description=loki serverWants=network-online.targetAfter=network-online.target[Service]ExecStart=/opt/app/loki/loki -config.file=/opt/app/loki/loki.yamlStandardOutput=syslogStandardError=syslogSyslogIdentifier=loki[Install]WantedBy=default.targetEOFsystemctl daemon-reloadsystemctl restart loki systemctl status loki grafana 上配置loki数据源在grafana explore上配置查看日志查看日志 rate({job="message"} |="kubelet" ...

关于prometheus:Prometheus-Grafana-快速上手

Prometheus + Grafana 疾速上手，监控主机的 CPU, GPU, MEM, IO 等状态。前提Docker客户端Node Exporter用于采集 UNIX 内核主机的数据，这里下载并解压： wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gztar xvfz node_exporter-1.1.2.linux-amd64.tar.gzcd node_exporter-1.1.2.linux-amd64nohup ./node_exporter &查看数据： $ curl http://localhost:9100/metrics# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.# TYPE go_gc_duration_seconds summarygo_gc_duration_seconds{quantile="0"} 0go_gc_duration_seconds{quantile="0.25"} 0go_gc_duration_seconds{quantile="0.5"} 0...DCGM Exporter用于采集 NVIDIA GPU 的数据，以 Docker 镜像运行： docker run -d --restart=always --gpus all -p 9400:9400 nvidia/dcgm-exporter查看数据： $ curl localhost:9400/metrics# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz).# TYPE DCGM_FI_DEV_SM_CLOCK gauge# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz).# TYPE DCGM_FI_DEV_MEM_CLOCK gauge# HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C)....服务器Prometheus配置 ~/prometheus.yml： ...

关于grafana:PrometheusGrafana安装搭建

介绍Prometheus是由SoundCloud开发的开源监控报警零碎和时序列数据库(TSDB)。Prometheus应用Go语言开发，是Google BorgMon监控零碎的开源版本。 2016年由Google发动Linux基金会旗下的原生云基金会(Cloud Native Computing Foundation), 将Prometheus纳入其下第二大开源我的项目。Prometheus目前在开源社区相当沉闷。 Prometheus和Heapster(Heapster是K8S的一个子项目，用于获取集群的性能数据。)相比性能更欠缺、更全面。Prometheus性能也足够撑持上万台规模的集群。 Prometheus的特点: 多维度数据模型。灵便的查询语言。不依赖分布式存储，单个服务器节点是自主的。通过基于HTTP的pull形式采集时序数据。能够通过两头网关进行时序列数据推送。通过服务发现或者动态配置来发现指标服务对象。反对多种多样的图表和界面展现，比方Grafana等。架构图 Prometheus服务大抵过程：Prometheus 定时去指标上抓取metrics(指标)数据，每个抓取指标须要裸露一个http服务的接口给它定时抓取。Prometheus反对通过配置文件、文本文件、Zookeeper、Consul、DNS SRV Lookup等形式指定抓取指标。Prometheus采纳PULL的形式进行监控，即服务器能够间接通过指标PULL数据或者间接地通过两头网关来Push数据。Prometheus在本地存储抓取的所有数据，并通过肯定规定进行清理和整顿数据，并把失去的后果存储到新的工夫序列中。Prometheus通过PromQL和其余API可视化地展现收集的数据。Prometheus反对很多形式的图表可视化，例如Grafana、自带的Promdash以及本身提供的模版引擎等等。Prometheus还提供HTTP API的查问形式，自定义所须要的输入。PushGateway反对Client被动推送metrics到PushGateway，而Prometheus只是定时去Gateway上抓取数据。Alertmanager是独立于Prometheus的一个组件，能够反对Prometheus的查问语句，提供非常灵便的报警形式。Prometheus 反对通过SNMP协定获取mertics数据.通过配置job,利用snmp_export读取设施监控信息.指标(Metric)类型Counter 计数器,从数据0开始累计计算. 现实状态会永远增长. 累计计算申请次数等Gauges 刹时状态的值. 能够任意变动的数值，实用 CPU 使用率温度等Histogram 对一段时间范畴内数据进行采样，并对所有数值求和与统计数量、柱状图. 某个工夫对某个度量值，分组，一段时间http相应大小，申请耗时的工夫。Summary 同样产生多个指标，别离带有后缀_bucket(仅histogram)、_sum、_countHistogram和Summary都能够获取分位数。通过Histogram取得分位数，要将直方图指标数据收集prometheus中，而后用prometheus的查问函数histogram_quantile()计算出来。 Summary则是在应用程序中间接计算出了分位数。Histograms and summaries中论述了两者的区别，特地是Summary的的分位数不能被聚合。留神，这个不能聚合不是说性能上不反对，而是说对分位数做聚合操作通常是没有意义的。LatencyTipOfTheDay: You can’t average percentiles. Period中对“分位数”不能被相加均匀的做了很具体的阐明：分位数自身是用来切分数据的，它们的平均数没有同样的分位成果。次要咱们监控用到最下面两种,上面两种类型目前我没有接触,下面这段文字与介绍援用自lijiaocn 装置Prometheus本次搭建利用docker形式.整体搭建实现须要两个容器.暂不配置告警相干,只做监控数据前提搭建地位: /home/aLong/prometheus/环境:docker19.03.1 须要指定版本请查阅官网文档.零碎:centos7筹备工作Prometheus的配置文件: prometheus.yml咱们建设在搭建地位的根下: touch prometheus.yml 在配置文件中退出测试演示配置 global: scrape_interval: 15s scrape_timeout: 10s evaluation_interval: 15sscrape_configs:- job_name: prometheus honor_timestamps: true scrape_interval: 5s scrape_timeout: 3s metrics_path: /metrics scheme: http static_configs: - targets: - localhost:9090留神配置文件的格局为yaml,语法问题请参考这里. ...

关于grafana:部署TelegrafInfluxdbGrafana-架构来监控-MySQL

前段时间小编写了一篇：应用Nginx+Telegraf+Influxb+Grafana构建高逼格Nginx集群监控零碎！文章，具体了介绍了采集器telegraf, 时序数据库influxdb , 数据展现Grafana的部署以及零碎主机侧的监控增加，图表的展现。本文就接着介绍应用 Telegraf+Influxdb+Grafana架构来疾速监控MySQL。首先，咱们先看下效果图：一、增加telegraf收集MySQL的配置文件因为咱们在后面曾经部署好telegraf 了，也曾经采集了主机侧的一些CPU ，内存，磁盘，网络流量等信息，当初收集Mysql 监控信息，最好还是与之前的telegraf.conf配置文件分来到。 [root@fxkj ~]# vim /etc/telegraf/telegraf.d/telegraf_mysql.conf[[outputs.influxdb]] database = "mysql_metrics" urls = ["http://127.0.0.1:8086"] namepass = ["*_mysql"] username = "fxkj" password = "123456"[[inputs.mysql]] servers = ["root:123456@tcp(localhost:3306)/?tls=false"] name_suffix = "_mysql" #database 示意数据库名称，采集的数据都放在此库中#urls 示意 influxdb 数据库地址#servers 外面蕴含了MySQL 受权用户的用户名，明码以及连贯mysql 的地址#name_suffix 示意名称后缀二、重启telegrag采集器，查看日志刷新[root@fxkj ~]# systemctl restart telegraf.service[root@fxkj ~]# tail -n 10 /tmp/telegraf.log 2020-08-11T01:37:20Z E! [outputs.influxdb] when writing to [http://localhost:8086]: Post 2020-08-11T01:37:42Z I! Loaded processors: 2020-08-11T01:37:42Z I! Loaded outputs: influxdb influxdb2020-08-11T01:37:42Z I! Tags enabled: host=fxkjnj.com2020-08-11T01:37:42Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"fxkj", Flush Interval:10s三、登录到Influxdb中查看Mysql监控数据[root@aly mysql]# influx -username fxkj -password '123456'Connected to http://localhost:8086 version 1.8.1InfluxDB shell version: 1.8.1> show databases;name: databasesname----_internaltelegrafmysql_metrics> use mysql_metrics;Using database mysql_metrics> show measurements;name: measurementsname----mysql_mysql能够看到有一个mysql_metrics的库以及一个叫mysql_mysql的表。查看表中有哪些字段： > show field keys from mysql_mysql;name: mysql_mysqlfieldKey fieldType-------- ---------aborted_clients integeraborted_connects integeraccess_denied_errors integerbusy_time integerbytes_received integerbytes_sent integercommands_admin_commands integercommands_create_index integercommands_create_procedure integercommands_drop_db integercommands_drop_event integercommands_drop_function integercommands_drop_index integercommands_stmt_reprepare integerhandler_mrr_key_refills integerhandler_read_rnd integerhandler_read_rnd_deleted integerhandler_read_rnd_next integerhandler_rollback integerhandler_savepoint integerhandler_savepoint_rollback integerinnodb_buffer_pool_pages_free integerinnodb_buffer_pool_pages_made_not_young integerinnodb_buffer_pool_pages_made_young integerinnodb_dblwr_writes integerinnodb_deadlocks integerinnodb_descriptors_memory integerinnodb_dict_tables integerinnodb_ibuf_merges integerinnodb_ibuf_segment_size integerinnodb_x_lock_spin_waits integerkey_blocks_not_flushed integernot_flushed_delayed_rows integeropen_files integeropen_streams integeropen_table_definitions integeropen_tables integerthreads_connected integerthreads_created integerthreads_running integeruptime integeruptime_since_flush_status integer四、登录Grafana 增加数据源，import 导入监控模板更多的监控模板，大家能够到https://grafana.com/grafana/d...。因为之前做了个主机侧的监控，所以为了不便辨别，咱们这次再增加一个influxdb数据源。点击Configuration ，抉择 Data Sources ，点击 Add data source数据源名称：MySQL (我监控模板里的数据源是这个名称，最好和我统一，不然模板导入会报错) URL ：Influxdb 地址Database ：mysql 监控信息寄存在 influxdb 数据库中的名称点击 Save & Test 验证增加是否有问题数据源筹备好了就能够导 mysql监控模板了。点击左侧+号，抉择import ,—— > 点击Upload.json file, 上传模板文件。到此模板就导入胜利了五、图标展现起源：https://www.toutiao.com/i6859...

关于grafana:Grafana插件Plugin中文汉化

汉化三方插件后面说过汉化Grafana的工作。目前在7.2.1下面，大部分曾经实现。细节持续欠缺。明天思考在第三方插件上做一些汉化。点到插件一看全是英文感觉很突出。领导看到了也不爽啊-.-！。找个软的捏饼图在展现方面比拟直观。Grafana上有一个插件Pie Chart。这个景象比拟少，同时在一些模版上应用中。就拿这个热热身。具体步骤下载我的项目我的项目地址：piechart-panel 文件构造： git clone git@github.com:grafana/piechart-panel.gitcd piechart-panel # 进入到目录yarn install 我间接把我的项目clone到grafana寄存插件的地位，我的grafana是为了测试run的一个docker镜像。把插件目录挂载到本机，代码clone到目录中。汉化工作依据下面目录看，次要批改文件都在src外面。 IDE关上此我的项目，在src中批改须要编辑的文件。图片举例，选项第一项抉择图形类型。选项内容pie / donut。通过翻译我批改成了派/甜甜圈。依据批改内容其余中央设计批改的都须要批改。我通过查问替换形式，在其余文件中批改了代码中的判断。例如上图右侧展现的文件相似。 build插件批改完须要的内容之后，grafana是能辨认到有一个插件，但没有build时候他会提醒你没有build插件。就是他不意识你的我的项目代码。这个怎么解决呢？看官网的文档执行 yarn dev # 执行完结提醒，美滋滋～✔ Bundling plugin in dev mode✨ Done in 4.91s.执行结束咱们重启grafana就能够看到成绩了。比照下原来的版本和汉化后的版本： before： After：测试&调试以上2，3步骤根本就是一个测试、调试的过程。我开始先把所有配置项汉化。而后再解决选项参数。接着build，重启grafana查看。如此往返达到预期指标。我本机调试用docker启动grafana，测完删了容器就好了。继续改良思考继续解决某个插件，能够思考fork原插件我的项目，remote add XXX源。而后新建分之来做本人的解决。master fetch XXX源以跟踪上游的更新。这样本人我的项目装置插件时候拉本人的就好啦，美滋滋。

macOS-安装-grafana

通过 brew 安装 => 失败brew update brew install grafana有个问题，无法启动，看起来是因为安装时的那个警告: Warning: grafana dependency icu4c was built with a different C++ standardlibrary (libc++ from clang). This may cause problems at runtime.Warning: The post-install step did not complete successfullyYou can try again using `brew postinstall grafana`使用 brew postinstall grafana 也没有解决。手动安装下载解压: wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-5.3.2.darwin-amd64.tar.gztar -zxvf grafana-5.3.2.darwin-amd64.tar.gz 在 conf 目录下创建 custom.ini 文件，在里面进行自己的配置，可以覆盖 defaults.ini 的默认配置。启动: ./bin/grafana-server web这是官方文档里写的启动命令，奇怪的是，我在官方下载的 5.3.2 版本的 Grafana 解压后并没有 bin 目录，很是奇怪。 ...

容器监控实践Grafana

概述Grafana 是一个开源的，可以用于大规模指标数据的可视化项目，甚至还能对指标进行报警。基于友好的 Apache License 2.0 开源协议，目前是prometheus监控展示的首选。优点如下： 1.使用：配置方便：支持Dashboard、Panel、Row等组合，且支持折线图、柱状图等多种图例图表漂亮：可以选择暗黑系或纯白系，你也可以自己定义颜色模板很多：grafana模板很活跃，有很多用户贡献的面板，直接导入就能用支持多种数据源：grafana作为展示面板，支持很多数据源，如Graphite、ES、Prometheus等权限管理简单：有admin、viewer等多种角色管控2.二次开发：如果默认的grafana不能满足你的需求，要二次开发，官方也提供了很多支持：协议为Apache License 2.0：商业友好，随便改吧，改完拿去卖也行。完善的API调用：权限、面板、用户、报警都支持api调用。多种鉴权方式：OAuth、LADP、Proxy多种方式，你可以接入自己公司的鉴权系统插件开发：如果你不想直接改代码，可以做自己的插件go+Angular+react：常用的技术栈，方便二次开发prometheus + grafana 做为监控组合很方便，很强大，改造了鉴权之后更加香。一开始我们还尝试使用grafana自带的报警功能，可惜比较鸡肋，无法用于生产，报警的issue一大堆官方也不想修改，作罢部署步骤一：安装grafana Grafana提供了很多种部署方式，如果你的展示报警是在K8S集群外，可以二进制直接部署，如果grafana本身在集群内，或者管理端也是k8s集群，可以用yaml部署： Deployment配置： apiVersion: apps/v1kind: Deploymentmetadata: namespace: kube-system name: grafanaspec: replicas: 1 selector: matchLabels: app: grafana template: metadata: namespace: kube-system annotations: grafana-version: '1.0' name: grafana labels: app: grafana spec: containers: - name: grafana image: grafana/grafana:5.1.0 imagePullPolicy: Always securityContext: runAsUser: 0 env: - name: GF_SECURITY_ADMIN_PASSWORD value: "admin" ports: - name: grafana containerPort: 3000 resources: requests: memory: "100Mi" cpu: "100m" limits: memory: "2048Mi" cpu: "1024m"Service配置： ...

grafana-图表item中正则截取

在使用grafana做zabbix图表时，碰到一个问题，监控sql server时，自动发现生成的item特别长，在grafana做图表显示的时候，如果将数据值展示到图表上，占了很大一个区域，图表宽度很窄如下图：其实只需要数据库名以及后面的部分指标名。最后的结果如下：主要就是在item下面加入了functions 更多高级用法可以参见https://alexanderzobnin.githu...

用cAdvisor InfluxDB Grafana监控docker容器的TcpState

问题搭建完cAdvisor InfluxDB Grafana监控集群后, 发现没有tcp相关的数据.源码版本:https://github.com/google/cad…git commit hash:9db8c7dee20a0c41627b208977ab192a0411bf93搭建cAdvisor InfluxDB Grafana参考https://botleg.com/stories/mo…定位过程是否cadvisor没有记录tcp state?容易搜索到, 因为cadvisor的高cpu占用, 需要–disable_metrics=““https://github.com/google/cad…实际上并非如此. 不带任何参数情况下, 本地启动cadvisor.~/gopath/src/github.com/google/cadvisor(master*) » sudo ./cadvisor -logtostderr 在浏览器中打开 http://127.0.0.1:8080/containers/ 可以看到response中, 带有TcpState.是否写入了influxdb?打开influx db shellInfluxDB shell 0.9.6.1> show databasesname: databases—————name_internalmydbcadvisor> use cadvisorUsing database cadvisor> show tag keysname: cpu_usage_system———————-tagKeycontainer_namemachine可以看到, 这些tagKey对应grafana中的select column.那么, 是否cadvisor没有写入influxdb呢?cadvisor/storage/influxdb/influxdb.go:174func (self *influxdbStorage) containerStatsToPoints( cInfo *info.ContainerInfo, stats *info.ContainerStats,) (points []*influxdb.Point) { // CPU usage: Total usage in nanoseconds points = append(points, makePoint(serCpuUsageTotal, stats.Cpu.Usage.Total)) // CPU usage: Time spend in system space (in nanoseconds) points = append(points, makePoint(serCpuUsageSystem, stats.Cpu.Usage.System)) // CPU usage: Time spent in user space (in nanoseconds) points = append(points, makePoint(serCpuUsageUser, stats.Cpu.Usage.User)) // CPU usage per CPU for i := 0; i < len(stats.Cpu.Usage.PerCpu); i++ { point := makePoint(serCpuUsagePerCpu, stats.Cpu.Usage.PerCpu[i]) tags := map[string]string{“instance”: fmt.Sprintf("%v”, i)} addTagsToPoint(point, tags) points = append(points, point) } // Load Average points = append(points, makePoint(serLoadAverage, stats.Cpu.LoadAverage)) // Memory Usage points = append(points, makePoint(serMemoryUsage, stats.Memory.Usage)) // Working Set Size points = append(points, makePoint(serMemoryWorkingSet, stats.Memory.WorkingSet)) // Network Stats points = append(points, makePoint(serRxBytes, stats.Network.RxBytes)) points = append(points, makePoint(serRxErrors, stats.Network.RxErrors)) points = append(points, makePoint(serTxBytes, stats.Network.TxBytes)) points = append(points, makePoint(serTxErrors, stats.Network.TxErrors)) self.tagPoints(cInfo, stats, points) return points}结论需要修改cadvisor代码, 将自己需要的metrics加上. ...

k8s与监控--k8s部署grafana6.0

前言本文主要介绍最新版本grafana6.0的一些新特性和如何部署到k8s当中。grafana6.0简介Grafana的这一更新引入了一种新的查询展示数据的方式，支持日志数据和大量其他功能。主要亮点是：Explore - 一个新的查询工作流，用于临时数据探索和故障排除。Grafana Loki - 与Grafana Labs的新开源日志聚合系统集成。Gauge Panel - 种用于gauges的新型独立面板。New Panel Editor UX 改进了面板编辑，并可在不同的可视化之间轻松切换。Google Stackdriver Datasource 已经过测试版并正式发布。Azure Monitor 插件从作为外部插件移植到核心数据源。React Plugin 支持可以更轻松地构建插件。Named Colors 包含在我们新的改良颜色选择器中。Removal of user session storage 使Grafana更易于部署并提高安全性。其实可以看出，Explore和Grafana Loki是专为用于grafana增强自己在日志展示方面而推出的future。不过 loki这个受prometheus启发而创建的日志存储和检索框架至今没有release，而且官方也不建议生产环境使用。但是loki是值得大家关注的一个技术，深度和k8s结合，可以用于专门处理k8s当中的日志。下面是一张使用Explore处理日志的截图：grafana6.0 部署我们主要提供将grafana6.0 部署到k8s中的方法。由于我们的环境是aws托管的k8s，所以需要注意pvc和svc这两个地方，需要大家移植的时候稍微做一下修改。下面是configmap，主要包含了ldap.toml 和 grafana.ini 两个配置文件。由于企业实际环境中，需要对接单位的ldap，所以包含了ldap.tomlapiVersion: v1kind: ConfigMapmetadata: labels: app: hawkeye-grafana name: hawkeye-grafana-cm namespace: sgtdata: ldap.toml: |- # To troubleshoot and get more log info enable ldap debug logging in grafana.ini # [log] # filters = ldap:debug [[servers]] # Ldap server host (specify multiple hosts space separated) host = “ldap.xxx.org” # Default port is 389 or 636 if use_ssl = true port = 389 # Set to true if ldap server supports TLS use_ssl = false # Set to true if connect ldap server with STARTTLS pattern (create connection in insecure, then upgrade to secure connection with TLS) start_tls = false # set to true if you want to skip ssl cert validation ssl_skip_verify = false # set to the path to your root CA certificate or leave unset to use system defaults # root_ca_cert = “/path/to/certificate.crt” # Authentication against LDAP servers requiring client certificates # client_cert = “/path/to/client.crt” # client_key = “/path/to/client.key” # Search user bind dn bind_dn = “cn=Manager,dc=xxx,dc=com” # Search user bind password # If the password contains # or ; you have to wrap it with triple quotes. Ex “”"#password;""" bind_password = ‘xxxxx’ # User search filter, for example “(cn=%s)” or “(sAMAccountName=%s)” or “(uid=%s)” search_filter = “(cn=%s)” # An array of base dns to search through search_base_dns = [“ou=tech,cn=hawkeye,dc=xxxx,dc=com”] ## For Posix or LDAP setups that does not support member_of attribute you can define the below settings ## Please check grafana LDAP docs for examples # group_search_filter = “(&(objectClass=posixGroup)(memberUid=%s))” # group_search_base_dns = [“ou=groups,dc=grafana,dc=org”] # group_search_filter_user_attribute = “uid” # Specify names of the ldap attributes your ldap uses [servers.attributes] name = “givenName” surname = “sn” username = “cn” member_of = “memberOf” email = “email” # Map ldap groups to grafana org roles [[servers.group_mappings]] group_dn = “cn=admins,dc=grafana,dc=org” org_role = “Admin” # To make user an instance admin (Grafana Admin) uncomment line below # grafana_admin = true # The Grafana organization database id, optional, if left out the default org (id 1) will be used # org_id = 1 [[servers.group_mappings]] group_dn = “cn=users,dc=grafana,dc=org” org_role = “Editor” [[servers.group_mappings]] # If you want to match all (or no ldap groups) then you can use wildcard group_dn = “*” org_role = “Viewer” grafana.ini: |- ##################### Grafana Configuration Example ##################### # # Everything has defaults so you only need to uncomment things you want to # change # possible values : production, development ;app_mode = production # instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty ;instance_name = ${HOSTNAME} #################################### Paths #################################### [paths] # Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used) ;data = /var/lib/grafana # Temporary files in data directory older than given duration will be removed ;temp_data_lifetime = 24h # Directory where grafana can store logs ;logs = /var/log/grafana # Directory where grafana will automatically scan and look for plugins ;plugins = /var/lib/grafana/plugins # folder that contains provisioning config files that grafana will apply on startup and while running. ;provisioning = conf/provisioning #################################### Server #################################### [server] # Protocol (http, https, socket) ;protocol = http # The ip address to bind to, empty will bind to all interfaces ;http_addr = # The http port to use http_port = 3000 # The public facing domain name used to access grafana from a browser ;domain = localhost # Redirect to correct domain if host header does not match domain # Prevents DNS rebinding attacks ;enforce_domain = false # The full public facing url you use in browser, used for redirects and emails # If you use reverse proxy and sub path specify full url (with sub path) ;root_url = http://localhost:3000 # Log web requests ;router_logging = false # the path relative working path ;static_root_path = public # enable gzip ;enable_gzip = false # https certs & key file ;cert_file = ;cert_key = # Unix socket path ;socket = #################################### Database #################################### [database] # You can configure the database connection by specifying type, host, name, user and password # as separate properties or as on string using the url properties. # Either “mysql”, “postgres” or “sqlite3”, it’s your choice ;type = sqlite3 ;host = 127.0.0.1:3306 ;name = grafana ;user = root # If the password contains # or ; you have to wrap it with triple quotes. Ex “”"#password;""" ;password = # Use either URL or the previous fields to configure the database # Example: mysql://user:secret@host:port/database ;url = # For “postgres” only, either “disable”, “require” or “verify-full” ;ssl_mode = disable # For “sqlite3” only, path relative to data_path setting ;path = grafana.db # Max idle conn setting default is 2 ;max_idle_conn = 2 # Max conn setting default is 0 (mean not set) ;max_open_conn = # Connection Max Lifetime default is 14400 (means 14400 seconds or 4 hours) ;conn_max_lifetime = 14400 # Set to true to log the sql calls and execution times. log_queries = #################################### Session #################################### [session] # Either “memory”, “file”, “redis”, “mysql”, “postgres”, default is “file” ;provider = file # Provider config options # memory: not have any config yet # file: session dir path, is relative to grafana data_path # redis: config like redis server e.g. addr=127.0.0.1:6379,pool_size=100,db=grafana # mysql: go-sql-driver/mysql dsn config string, e.g. user:password@tcp(127.0.0.1:3306)/database_name # postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable ;provider_config = sessions # Session cookie name ;cookie_name = grafana_sess # If you use session in https only, default is false ;cookie_secure = false # Session life time, default is 86400 ;session_life_time = 86400 #################################### Data proxy ########################### [dataproxy] # This enables data proxy logging, default is false ;logging = false #################################### Analytics #################################### [analytics] # Server reporting, sends usage counters to stats.grafana.org every 24 hours. # No ip addresses are being tracked, only simple counters to track # running instances, dashboard and error counts. It is very helpful to us. # Change this option to false to disable reporting. ;reporting_enabled = true # Set to false to disable all checks to https://grafana.net # for new vesions (grafana itself and plugins), check is used # in some UI views to notify that grafana or plugin update exists # This option does not cause any auto updates, nor send any information # only a GET request to http://grafana.com to get latest versions ;check_for_updates = true # Google Analytics universal tracking code, only enabled if you specify an id here ;google_analytics_ua_id = #################################### Security #################################### [security] # default admin user, created on startup ;admin_user = admin # default admin password, can be changed before first start of grafana, or in profile settings ;admin_password = admin # used for signing ;secret_key = xxxxx # Auto-login remember days ;login_remember_days = 7 ;cookie_username = grafana_user ;cookie_remember_name = grafana_remember # disable gravatar profile images ;disable_gravatar = false # data source proxy whitelist (ip_or_domain:port separated by spaces) ;data_source_proxy_whitelist = # disable protection against brute force login attempts ;disable_brute_force_login_protection = false #################################### Snapshots ########################### [snapshots] # snapshot sharing options ;external_enabled = true ;external_snapshot_url = https://snapshots-origin.raintank.io ;external_snapshot_name = Publish to snapshot.raintank.io # remove expired snapshot ;snapshot_remove_expired = true #################################### Dashboards History ################## [dashboards] # Number dashboard versions to keep (per dashboard). Default: 20, Minimum: 1 ;versions_to_keep = 20 #################################### Users ############################### [users] # disable user signup / registration ;allow_sign_up = true # Allow non admin users to create organizations ;allow_org_create = true # Set to true to automatically assign new users to the default organization (id 1) ;auto_assign_org = true # Default role new users will be automatically assigned (if disabled above is set to true) ;auto_assign_org_role = Viewer # Background text for the user field on the login page ;login_hint = email or username # Default UI theme (“dark” or “light”) ;default_theme = dark # External user management, these options affect the organization users view ;external_manage_link_url = ;external_manage_link_name = ;external_manage_info = # Viewers can edit/inspect dashboard settings in the browser. But not save the dashboard. ;viewers_can_edit = false [auth] # Set to true to disable (hide) the login form, useful if you use OAuth, defaults to false ;disable_login_form = false # Set to true to disable the signout link in the side menu. useful if you use auth.proxy, defaults to false ;disable_signout_menu = false # URL to redirect the user to after sign out ;signout_redirect_url = #################################### Anonymous Auth ########################## [auth.anonymous] # enable anonymous access ;enabled = false # specify organization name that should be used for unauthenticated users ;org_name = Main Org. # specify role for unauthenticated users ;org_role = Viewer #################################### Github Auth ########################## [auth.github] ;enabled = false ;allow_sign_up = true ;client_id = some_id ;client_secret = some_secret ;scopes = user:email,read:org ;auth_url = https://github.com/login/oauth/authorize ;token_url = https://github.com/login/oauth/access_token ;api_url = https://api.github.com/user ;team_ids = ;allowed_organizations = #################################### Google Auth ########################## [auth.google] ;enabled = false ;allow_sign_up = true ;client_id = some_client_id ;client_secret = some_client_secret ;scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email ;auth_url = https://accounts.google.com/o/oauth2/auth ;token_url = https://accounts.google.com/o/oauth2/token ;api_url = https://www.googleapis.com/oauth2/v1/userinfo ;allowed_domains = #################################### Generic OAuth ########################## [auth.generic_oauth] ;enabled = false ;name = OAuth ;allow_sign_up = true ;client_id = some_id ;client_secret = some_secret ;scopes = user:email,read:org ;auth_url = https://foo.bar/login/oauth/authorize ;token_url = https://foo.bar/login/oauth/access_token ;api_url = https://foo.bar/user ;team_ids = ;allowed_organizations = ;tls_skip_verify_insecure = false ;tls_client_cert = ;tls_client_key = ;tls_client_ca = #################################### Grafana.com Auth #################### [auth.grafana_com] ;enabled = false ;allow_sign_up = true ;client_id = some_id ;client_secret = some_secret ;scopes = user:email ;allowed_organizations = #################################### Auth Proxy ########################## [auth.proxy] ;enabled = false ;header_name = X-WEBAUTH-USER ;header_property = username ;auto_sign_up = true ;ldap_sync_ttl = 60 ;whitelist = 192.168.1.1, 192.168.2.1 ;headers = Email:X-User-Email, Name:X-User-Name #################################### Basic Auth ########################## [auth.basic] ;enabled = true #################################### Auth LDAP ########################## [auth.ldap] enabled = true ;config_file = /etc/grafana/ldap.toml ;allow_sign_up = true #################################### SMTP / Emailing ########################## [smtp] enabled = true host = smtp.exmail.qq.com:465 user = noreply@xxx.com # If the password contains # or ; you have to wrap it with trippel quotes. Ex “”"#password;""" password = AFxxxxxxYoQ2G from_address = noreply@xxxx.com from_name = Hawkeye ;cert_file = ;key_file = ;skip_verify = false ;from_address = admin@grafana.localhost ;from_name = Grafana # EHLO identity in SMTP dialog (defaults to instance_name) ;ehlo_identity = dashboard.example.com [emails] ;welcome_email_on_sign_up = false #################################### Logging ########################## [log] # Either “console”, “file”, “syslog”. Default is console and file # Use space to separate multiple modes, e.g. “console file” ;mode = console file # Either “debug”, “info”, “warn”, “error”, “critical”, default is “info” ;level = info # optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug ;filters = # For “console” mode only [log.console] ;level = # log line format, valid options are text, console and json ;format = console # For “file” mode only [log.file] ;level = # log line format, valid options are text, console and json ;format = text # This enables automated log rotate(switch of following options), default is true ;log_rotate = true # Max line number of single file, default is 1000000 ;max_lines = 1000000 # Max size shift of single file, default is 28 means 1 << 28, 256MB ;max_size_shift = 28 # Segment log daily, default is true ;daily_rotate = true # Expired days of log file(delete after max days), default is 7 ;max_days = 7 [log.syslog] ;level = # log line format, valid options are text, console and json ;format = text # Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used. ;network = ;address = # Syslog facility. user, daemon and local0 through local7 are valid. ;facility = # Syslog tag. By default, the process’ argv[0] is used. ;tag = #################################### Alerting ############################ [alerting] # Disable alerting engine & UI features ;enabled = true # Makes it possible to turn off alert rule execution but alerting UI is visible ;execute_alerts = true # Default setting for new alert rules. Defaults to categorize error and timeouts as alerting. (alerting, keep_state) ;error_or_timeout = alerting # Default setting for how Grafana handles nodata or null values in alerting. (alerting, no_data, keep_state, ok) ;nodata_or_nullvalues = no_data # Alert notifications can include images, but rendering many images at the same time can overload the server # This limit will protect the server from render overloading and make sure notifications are sent out quickly ;concurrent_render_limit = 5 #################################### Explore ############################# [explore] # Enable the Explore section ;enabled = false #################################### Internal Grafana Metrics ########################## # Metrics available at HTTP API Url /metrics [metrics] # Disable / Enable internal metrics ;enabled = true # Publish interval ;interval_seconds = 10 # Send internal metrics to Graphite [metrics.graphite] # Enable by setting the address setting (ex localhost:2003) ;address = ;prefix = prod.grafana.%(instance_name)s. #################################### Distributed tracing ############ [tracing.jaeger] # Enable by setting the address sending traces to jaeger (ex localhost:6831) ;address = localhost:6831 # Tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2) ;always_included_tag = tag1:value1 # Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote ;sampler_type = const # jaeger samplerconfig param # for “const” sampler, 0 or 1 for always false/true respectively # for “probabilistic” sampler, a probability between 0 and 1 # for “rateLimiting” sampler, the number of spans per second # for “remote” sampler, param is the same as for “probabilistic” # and indicates the initial sampling rate before the actual one # is received from the mothership ;sampler_param = 1 #################################### Grafana.com integration ########################## # Url used to import dashboards directly from Grafana.com [grafana_com] ;url = https://grafana.com #################################### External image storage ########################## [external_image_storage] # Used for uploading images to public servers so they can be included in slack/email messages. # you can choose between (s3, webdav, gcs, azure_blob, local) ;provider = [external_image_storage.s3] ;bucket = ;region = ;path = ;access_key = ;secret_key = [external_image_storage.webdav] ;url = ;public_url = ;username = ;password = [external_image_storage.gcs] ;key_file = ;bucket = ;path = [external_image_storage.azure_blob] ;account_name = ;account_key = ;container_name = [external_image_storage.local] # does not require any configuration [rendering] # Options to configure external image rendering server like https://github.com/grafana/grafana-image-renderer ;server_url = ;callback_url =—以上凡是我打 xxx都是经过修改的，隐藏了本司的一些重要信息。大家需要根据实际情况，自行配置修改。。如果你不需要ldap 认证，这可以删除configmap当中的 ldap.toml 并且在grafana.ini 当中将true改为false。#################################### Auth LDAP ########################## [auth.ldap] enabled = true ;config_file = /etc/grafana/ldap.toml ;allow_sign_up = true deployment.yaml 如下：apiVersion: apps/v1kind: Deploymentmetadata: name: hawkeye-grafana namespace: sgt labels: app: hawkeye-grafanaspec: replicas: 1 selector: matchLabels: app: hawkeye-grafana template: metadata: labels: app: hawkeye-grafana spec: containers: - image: grafana/grafana:6.0.0 name: grafana imagePullPolicy: IfNotPresent # env: env: - name: GF_PATHS_PROVISIONING value: /var/lib/grafana/provisioning resources: # keep request = limit to keep this container in guaranteed class limits: cpu: 100m memory: 100Mi requests: cpu: 100m memory: 100Mi readinessProbe: httpGet: path: /login port: 3000 # initialDelaySeconds: 30 # timeoutSeconds: 1 volumeMounts: - name: grafana-persistent-storage mountPath: /var/lib/grafana/ - name: config mountPath: /etc/grafana/ initContainers: - name: “init-chown-data” image: “busybox:latest” imagePullPolicy: “IfNotPresent” command: [“chown”, “-R”, “472:472”, “/var/lib/grafana/”] volumeMounts: - name: grafana-persistent-storage mountPath: /var/lib/grafana/ subPath: "" volumes: - name: config configMap: name: hawkeye-grafana-cm - name: grafana-persistent-storage persistentVolumeClaim: claimName: hawkeye-grafana-claim—注意增加了initContainers，主要是解决挂载的写权限的问题。service.yaml 如下：apiVersion: v1kind: Servicemetadata: annotations: service.beta.kubernetes.io/aws-load-balancer-type: nlb labels: app: hawkeye-grafana name: hawkeye-grafana namespace: sgtspec: type: LoadBalancer ports: - name: http port: 80 protocol: TCP targetPort: 3000 selector: app: hawkeye-grafana—pvc.yaml 如下：kind: PersistentVolumeClaimapiVersion: v1metadata: name: hawkeye-grafana-claim namespace: sgtspec: accessModes: - ReadWriteOnce resources: requests: storage: 30Gi执行成功以后，访问成功，通过admin/admin 登录。可以看出，左侧新增了Explore图标。总结grafana 首先会从/usr/share/grafana/conf/defaults.ini读取配置文件，然后再读取/etc/grafana/grafana.ini读取，同一参数的配置，那么/etc/grafana/grafana.ini 会覆盖/usr/share/grafana/conf/defaults.ini中配置。而命令行配置的参数会覆盖/etc/grafana/grafana.ini中的同一参数，最后环境变量中同一配置，又会覆盖命令行中的。下面是默认的一些环境变量：GF_PATHS_CONFIG /etc/grafana/grafana.iniGF_PATHS_DATA /var/lib/grafanaGF_PATHS_HOME /usr/share/grafanaGF_PATHS_LOGS /var/log/grafanaGF_PATHS_PLUGINS /var/lib/grafana/pluginsGF_PATHS_PROVISIONING /etc/grafana/provisioning ...

go + influxdb + grafana 日志监控系统

docker 运行 influxdb grafanadocker 启动 influxdb# 启动 docker$ sudo docker run -d -p 8083:8083 -p8086:8086 –expose 8090 –expose 8099 –name indb -v /data/dockerdata/influxdb:/var/lib/influxdb docker.io/influxdb# 创建数据库和用户$ sudo docker exec -it indb /bin/bash> create User nginx with password ‘123456’> GRANT ALL PRIVILEGES ON monitor TO nginx > CREATE RETENTION POLICY “monitor_retention” ON “monitor” DURATION 30d REPLICATION 1 DEFAULT docker 启动 grafana# grafana 5.10 后创建数据卷需要传入权限, 使用 nginx 反代需要设置 server root# 使用 link 连接其他容器，访问该容器内容直接使用容器名称sudo docker run -d \ -p 3000:3000 \ -e INFLUXDB_HOST=localhost \ -e INFLUXDB_PORT=8086 \ -e INFLUXDB_NAME=monitor \ -e INFLUXDB_USER=nginx \ -e INFLUXDB_PASS=123456 \ -e “GF_SECURITY_ADMIN_PASSWORD=123456” \ -e “GF_SERVER_ROOT_URL=https://www.amoyiki.com/monitor/” \ -v /data/dockerdata/grafana:/var/lib/grafana \ –link indb:indb \ –user root \ –name grafana \ grafana/grafana配置 grafana 数据源PS Access 使用 Server 即可go 项目编写 go 代码本代码完全照搬慕课网视频教程package mainimport ( “bufio” “encoding/json” “flag” “fmt” “github.com/influxdata/influxdb/client/v2” “io” “log” “net/http” “net/url” “os” “regexp” “strconv” “strings” “time”)type Reader interface { Read(rc chan []byte)}type Writer interface { Write(wc chan *Message)}type LogProcess struct { rc chan []byte // 读取 -> 解析 wc chan *Message // 解析 -> 写入 reader Reader writer Writer}type Message struct { TimeLocal time.Time ByteSent int Path, Method, Scheme, Status string RequestTime float64}type SystemInfo struct { HandleLine int json: "handleLine" // 总日志行数 Tps float64 json: "tps" // 系统吞吐量 ReadChanLen int json: "readChanLen" // read channel 长度 WriteChanLen int json: "wirteChanLen" // wirte channel 长度 RunTime string json:"runTime" // 总运行时间 ErrNum int json:"errTime" // 错误数}const ( TypeHandleLine = 0 TypeErrNum = 1)var TypeMonitorChan = make(chan int, 200)type Monitor struct { startTime time.Time data SystemInfo tpsSlic []int}func (m *Monitor) start(lp *LogProcess) { go func() { for n := range TypeMonitorChan { switch n { case TypeErrNum: m.data.ErrNum += 1 case TypeHandleLine: m.data.HandleLine += 1 } } }() ticker := time.NewTicker(time.Second * 5) go func() { for { <-ticker.C m.tpsSlic = append(m.tpsSlic, m.data.HandleLine) if len(m.tpsSlic) > 2 { m.tpsSlic = m.tpsSlic[1:] } } }() http.HandleFunc("/monitor", func(writer http.ResponseWriter, request *http.Request) { m.data.RunTime = time.Now().Sub(m.startTime).String() m.data.ReadChanLen = len(lp.rc) m.data.WriteChanLen = len(lp.wc) if len(m.tpsSlic) >= 2 { m.data.Tps = float64(m.tpsSlic[1]-m.tpsSlic[0]) / 5 } ret, _ := json.MarshalIndent(m.data, “”, “\t”) io.WriteString(writer, string(ret)) }) http.ListenAndServe(":9999", nil)}type ReadFromFile struct { path string // 读取文件的地址}func (r *ReadFromFile) Read(rc chan []byte) { // 打开文件 f, err := os.Open(r.path) if err != nil { panic(fmt.Sprintln(“open file error: %s”, err.Error())) } // 从文件末尾逐行读取文件内容 f.Seek(0, 2) rd := bufio.NewReader(f) for { line, err := rd.ReadBytes(’\n’) if err == io.EOF { time.Sleep(500 * time.Millisecond) continue } else if err != nil { panic(fmt.Sprintln(“read file error: %s”, err.Error())) } TypeMonitorChan <- TypeHandleLine rc <- line[:len(line)-1] }}type WriteToInfluxDB struct { influxDBsn string}func (w WriteToInfluxDB) Write(wc chan Message) { // 写入模块 infSli := strings.Split(w.influxDBsn, “&”) // Create a new HTTPClient c, err := client.NewHTTPClient(client.HTTPConfig{ Addr: infSli[0], Username: infSli[1], Password: infSli[2], }) if err != nil { log.Fatal(err) } defer c.Close() // Create a new point batch bp, err := client.NewBatchPoints(client.BatchPointsConfig{ Database: infSli[3], Precision: infSli[4], }) if err != nil { log.Fatal(err) } for v := range wc { // Create a point and add to batch tags := map[string]string{“Path”: v.Path, “Method”: v.Method, “Scheme”: v.Scheme, “Status”: v.Status} fields := map[string]interface{}{ “RequestTime”: v.RequestTime, “BytesSent”: v.ByteSent, } pt, err := client.NewPoint(“nginx_log”, tags, fields, v.TimeLocal) if err != nil { log.Fatal(err) } bp.AddPoint(pt) // Write the batch if err := c.Write(bp); err != nil { log.Fatal(err) } // Close client resources if err := c.Close(); err != nil { log.Fatal(err) } log.Println(“write influxdb success …”) }}func (l LogProcess) Process() { // 解析模块 / 139.199.10.130 - - [12/Dec/2018:16:02:34 +0800] “POST /wp-cron.php HTTP/1.1” 200 0 “https://xxx.exmple.com/wp-cron.php?doing_wp_cron=1544601753.9868400096893310546875" “WordPress/4.9.8; https://xxx.exmple.com” “-” “0.058” ([\d.]+)\s+([^ []+)\s+([^ []+)\s+[([^]]+)]\s+"([^”]+)"\s+(\d{3})\s+(\d+)\s+"([^"]+)"\s+"(.?)"\s+"([^"]+)"\s+"([^"]+)" */ r := regexp.MustCompile(([\d\.]+)\s+([^ \[]+)\s+([^ \[]+)\s+\[([^\]]+)\]\s+\"([^"]+)\"\s+(\d{3})\s+(\d+)\s+\"([^"]+)\"\s+\"(.*?)\"\s+\"([^"]+)\"\s+\"([^"]+)\") loc, _ := time.LoadLocation(“Asia/Shanghai”) for v := range l.rc { ret := r.FindStringSubmatch(string(v)) if len(ret) != 14 { TypeMonitorChan <- TypeErrNum log.Println(“FindStringSubmatch fail: “, string(v)) continue } message := &Message{ } t, err := time.ParseInLocation(“02/Jan/2006:15:04:05 +0000”, ret[4], loc) if err != nil { TypeMonitorChan <- TypeErrNum log.Println(“ParseInLocation failed: “, err.Error(), ret[4]) continue } message.TimeLocal = t byteSent, _ := strconv.Atoi(ret[8]) message.ByteSent = byteSent reqSli := strings.Split(ret[6], " “) if len(reqSli) != 3 { TypeMonitorChan <- TypeErrNum log.Println(“strings.Split fail”, ret[6]) continue } message.Method = reqSli[0] u, err := url.Parse(reqSli[1]) if err != nil { TypeMonitorChan <- TypeErrNum log.Println(“url parse fail: “, err) continue } message.Path = u.Path message.Scheme = ret[5] message.Status = ret[7] requestTime, _ := strconv.ParseFloat(ret[12], 64) message.RequestTime = requestTime l.wc <- message }}func main() { var path, influxDsn string // 利用命令行参数传入配置 flag.StringVar(&path, “path”, “./access.log”, “read file path”) flag.StringVar(&influxDsn, “influxDsn”, “http://localhost:8086&root&123456&imooc&s”, “influx data source”) flag.Parse() r := &ReadFromFile{ path: path, } w := &WriteToInfluxDB{ influxDBsn: influxDsn, } lp := &LogProcess{ rc: make(chan []byte), wc: make(chan *Message), reader: r, writer: w, } go lp.reader.Read(lp.rc) go lp.Process() go lp.writer.Write(lp.wc) m := &Monitor{ startTime: time.Now(), data: SystemInfo{}, } m.start(lp)}编写启动脚本(稍后用 docker 部署时使用)./log_process –path “/var/log/nginx/access.log” –influxDsn “http://indb:8086&root&root&monitor&s” 编译 go 项目go build log_process.go编写DockerfileFROM golang:latest MAINTAINER amoyiki “amoyiki@gmail.com” WORKDIR $GOPATH/src/amoyiki.com/nginxlog ADD . $GOPATH/src/amoyiki.com/nginxlog EXPOSE 9999 ENTRYPOINT [“sh”, “./start_process.sh”] 编译并启动镜像$ sudo docker build -t log_process . # 指定数据卷方便容器读取 nginx 日志，指定关联 influxdb 容器，确保 go 项目能连接到 influxdb 容器$ sudo docker run –name log1 -d -v /var/log/nginx:/var/log/nginx –link indb:indb log_process # 查看当前所有容器$ sudo docker ps -a最终结果更多文章请访问我的 Blog 四畳半神话大系 ...

使用Prometheus+Grafana监控JVM

原文地址摘要用到的工具：Docker，本文大量使用了Docker来启动各个应用。Prometheus，负责抓取/存储指标信息，并提供查询功能。Grafana，负责数据可视化。JMX exporter，提供JMX中和JVM相关的信息。Tomcat，用来模拟一个Java应用。先讲一下大致步骤：利用JMX exporter，在Java进程内启动一个小型的Http server配置Prometheus抓取那个Http server提供的数据。配置Grafana连接Prometheus，配置Dashboard。第一步：启动几个Java应用1) 新建一个目录，名字叫做prom-jvm-demo。2) 下载JMX exporter到这个目录3) 新建一个文件simple-config.yml内容如下：—blacklistObjectNames: [":"]4) 运行以下命令启动3个Tomcat，记得把<path-to-prom-jvm-demo>替换成正确的路径：docker run -d \ –name tomcat-1 \ -v <path-to-prom-jvm-demo>:/jmx-exporter \ -e CATALINA_OPTS="-Xms64m -Xmx128m -javaagent:/jmx-exporter/jmx_prometheus_javaagent-0.3.1.jar=6060:/jmx-exporter/simple-config.yml" \ -p 6060:6060 \ -p 8080:8080 \ tomcat:8.5-alpinedocker run -d \ –name tomcat-2 \ -v <path-to-prom-jvm-demo>:/jmx-exporter \ -e CATALINA_OPTS="-Xms64m -Xmx128m -javaagent:/jmx-exporter/jmx_prometheus_javaagent-0.3.1.jar=6060:/jmx-exporter/simple-config.yml" \ -p 6061:6060 \ -p 8081:8080 \ tomcat:8.5-alpinedocker run -d \ –name tomcat-3 \ -v <path-to-prom-jvm-demo>:/jmx-exporter \ -e CATALINA_OPTS="-Xms64m -Xmx128m -javaagent:/jmx-exporter/jmx_prometheus_javaagent-0.3.1.jar=6060:/jmx-exporter/simple-config.yml" \ -p 6062:6060 \ -p 8082:8080 \ tomcat:8.5-alpine5) 访问http://localhost:8080|8081|8082看看Tomcat是否启动成功。6) 访问对应的http://localhost:6060|6061|6062看看JMX exporter提供的metrics。备注：这里提供的simple-config.yml仅仅提供了JVM的信息，更复杂的配置请参考JMX exporter文档。第二步：启动Prometheus1) 在之前新建目录prom-jvm-demo，新建一个文件prom-jmx.yml，内容如下：crape_configs: - job_name: ‘java’ static_configs: - targets: - ‘<host-ip>:6060’ - ‘<host-ip>:6061’ - ‘<host-ip>:6062'2) 启动Prometheus：docker run -d \ –name=prometheus \ -p 9090:9090 \ -v <path-to-prom-jvm-demo>:/prometheus-config \ prom/prometheus –config.file=/prometheus-config/prom-jmx.yml3) 访问http://localhost:9090看看Prometheus是否启动成功，在输入框里输入jvm_info然后执行，应该可以看到如下图的结果：如果没有看到三个instance，那么等一会儿再试。第三步：配置Grafana1) 启动Grafana：docker run -d –name=grafana -p 3000:3000 grafana/grafana2) 访问http://localhost:3000，使用admin/admin登录。3) 添加Prometheus数据源，如下图所示到添加数据源页面：4) 配置数据源信息：Name：随便取Type：PrometheusURL：http://<host-ip>:9090其余不要设置，点击Save & Test，应该会返回成功结果5) 导入Dashboard。我们不需要重头自己做Dashboard，用现成的就行，按下图所示进入导入页面6) 使用我制作的JVM Dashboard，页面右侧出现的ID号是8563，记住这个号，填在如下图所示的位置：7) 然后鼠标点击别处稍等一下，出现下图，选择一下数据源就可以了8) 最后打开刚刚导入的Dashboard，如下图： ...