Sampler 是一个用于 shell 命令执行,可视化和告警的工具。其配置应用的是一个简略的 YAML 文件。
1、为什么我须要它?
你能够间接从终端对任意动静过程进行采样 – 察看数据库中的更改,监控 MQ 动静音讯(in-flight messages),触发部署脚本并在实现后获取告诉。
如果有一种办法能够应用 shell 命令获取指标(metric),那么能够应用 Sampler 立刻对其进行可视化。
2、装置
macOS
brew cask install sampler
或
sudo curl -Lo /usr/local/bin/sampler https://github.com/sqshq/sampler/releases/download/v1.0.3/sampler-1.0.3-darwin-amd64
sudo chmod +x /usr/local/bin/sampler
Linux
sudo wget https://github.com/sqshq/sampler/releases/download/v1.0.3/sampler-1.0.3-linux-amd64 -O /usr/local/bin/sampler
sudo chmod +x /usr/local/bin/sampler
留神 :须要为 Sampler 装置 libasound2-dev 零碎库用以播放触发器声音。通常库已装置在相应地位,但如果没有 – 你能够应用你习惯的包管理器进行装置,例如 apt install libasound2-dev
Windows(试验)
倡议在高级控制台模拟器下应用,如 Cmder
Download .exe
3、应用
指定 shell 命令,Sampler 会相应的速率执行这些命令。输入用于可视化。
应用 Sampler 基本上的三步过程:
在 YAML 配置文件中定义 shell 命令
运行 sampler -c config.yml
在 UI 上调整组件大小和地位
市面早已有许多监控零碎
Sampler 绝不是监控零碎的替代品,而是易于设置的开发工具。
如果 spinning up 和应用 Grafana 配置 Prometheus 是齐全多余的工作,那么 Sampler 可能是正确的解决方案。没有服务器,没有数据库,不须要部署 – 你指定了 shell 命令,它就能够工作了。
我监控的每台服务器上都须要装置吗?
不,你能够在本地运行 Sampler,但依然能够从多台近程计算机上收集遥测数据。任何可视化都可能具备 init 命令,你能够在其中 ssh 到近程服务器。请参阅 SSH example
4、组件
以下是每种组件类型的配置示例列表,其中蕴含与 macOS 兼容的采样脚本。
Runchart
runcharts:
- title: Search engine response time
rate-ms: 500 # sampling rate, default = 1000
scale: 2 # number of digits after sample decimal point, default = 1
legend:
enabled: true # enables item labels, default = true
details: false # enables item statistics: cur/min/max/dlt values, default = true
items:
- label: GOOGLE
sample: curl -o /dev/null -s -w '%{time_total}' https://www.google.com
color: 178 # 8-bit color number, default one is chosen from a pre-defined palette
- label: YAHOO
sample: curl -o /dev/null -s -w '%{time_total}' https://search.yahoo.com
- label: BING
sample: curl -o /dev/null -s -w '%{time_total}' https://www.bing.com
Sparkline
sparklines:
- title: CPU usage
rate-ms: 200
scale: 0
sample: ps -A -o %cpu | awk '{s+=$1} END {print s}'
- title: Free memory pages
rate-ms: 200
scale: 0
sample: memory_pressure | grep 'Pages free' | awk '{print $3}'
Barchart
barcharts:
- title: Local network activity
rate-ms: 500 # sampling rate, default = 1000
scale: 0 # number of digits after sample decimal point, default = 1
items:
- label: UDP bytes in
sample: nettop -J bytes_in -l 1 -m udp | awk '{sum += $4} END {print sum}'
- label: UDP bytes out
sample: nettop -J bytes_out -l 1 -m udp | awk '{sum += $4} END {print sum}'
- label: TCP bytes in
sample: nettop -J bytes_in -l 1 -m tcp | awk '{sum += $4} END {print sum}'
- label: TCP bytes out
sample: nettop -J bytes_out -l 1 -m tcp | awk '{sum += $4} END {print sum}'
Gauge
gauges:
- title: Minute progress
rate-ms: 500 # sampling rate, default = 1000
scale: 2 # number of digits after sample decimal point, default = 1
percent-only: false # toggle display of the current value, default = false
color: 178 # 8-bit color number, default one is chosen from a pre-defined palette
cur:
sample: date +%S # sample script for current value
max:
sample: echo 60 # sample script for max value
min:
sample: echo 0 # sample script for min value
- title: Year progress
cur:
sample: date +%j
max:
sample: echo 365
min:
sample: echo 0
Textbox
textboxes:
- title: Local weather
rate-ms: 10000 # sampling rate, default = 1000
sample: curl wttr.in?0ATQF
border: false # border around the item, default = true
color: 178 # 8-bit color number, default is white
- title: Docker containers stats
rate-ms: 500
sample: docker stats --no-stream --format "table {{.Name}}t{{.CPUPerc}}t{{.MemUsage}}t{{.PIDs}}"
Asciibox
asciiboxes:
- title: UTC time
rate-ms: 500 # sampling rate, default = 1000
font: 3d # font type, default = 2d
border: false # border around the item, default = true
color: 43 # 8-bit color number, default is white
sample: env TZ=UTC date +%r
5、额定性能
Triggers
触发器容许执行条件操作,如视觉 / 声音告警或任意 shell 命令。以下示例阐明了此概念。
Clock gauge,从开始的每分钟显示工夫进度和以后工夫
gauges:
- title: MINUTE PROGRESS
position: [[0, 18], [80, 0]]
cur:
sample: date +%S
max:
sample: echo 60
min:
sample: echo 0
triggers:
- title: CLOCK BELL EVERY MINUTE
condition: '[$label =="cur"] && [$cur -eq 0] && echo 1 || echo 0' # expects "1" as TRUE indicator
actions:
terminal-bell: true # standard terminal bell, default = false
sound: true # NASA quindar tone, default = false
visual: false # notification with current value on top of the component area, default = false
script: say -v samantha `date +%I:%M%p` # an arbitrary script, which can use $cur, $prev and $label variables
搜索引擎提早图表,在提早超过阈值时向用户收回告警
runcharts:
- title: SEARCH ENGINE RESPONSE TIME (sec)
rate-ms: 200
items:
- label: GOOGLE
sample: curl -o /dev/null -s -w '%{time_total}' https://www.google.com
- label: YAHOO
sample: curl -o /dev/null -s -w '%{time_total}' https://search.yahoo.com
triggers:
- title: Latency threshold exceeded
condition: echo "$prev < 0.3 && $cur > 0.3" |bc -l # expects "1" as TRUE indicator
actions:
terminal-bell: true # standard terminal bell, default = false
sound: true # NASA quindar tone, default = false
visual: true # visual notification on top of the component area, default = false
script: 'say alert: ${label} latency exceeded ${cur} second' # an arbitrary script, which can use $cur, $prev and $label variables
交互式 shell 反对
除了 sample 命令之外,还能够指定 init 命令(在采样前仅执行一次)和 transform 命令(后处理采样命令输入)。这包含交互式 shell 用例,例如仅建设与数据库的连贯一次,而后在交互式 shell 会话中执行轮询。
Basic mode
textboxes:
- title: MongoDB polling
rate-ms: 500
init: mongo --quiet --host=localhost test # executes only once to start the interactive session
sample: Date.now(); # executes with a required rate, in scope of the interactive session
transform: echo result = $sample # executes in scope of local session, $sample variable is available for transformation
PTY mode
在某些状况下,交互式 shell 将无奈工作,因为它的 stdin 不是终端。这种状况下咱们能够应用 PTY 模式:
textboxes:
- title: Neo4j polling
pty: true # enables pseudo-terminal mode, default = false
init: cypher-shell -u neo4j -p pwd --format plain
sample: RETURN rand();
transform: echo "$sample" | tail -n 1
- title: Top on a remote server
pty: true # enables pseudo-terminal mode, default = false
init: ssh -i ~/user.pem ec2-user@1.2.3.4
sample: top
init 命令逐渐执行
在开始采样之前,还能够一一执行多个 init 命令。
textboxes:
- title: Java application uptime
multistep-init:
- java -jar jmxterm-1.0.0-uber.jar
- open host:port # or local PID
- bean java.lang:type=Runtime
sample: get Uptime
变量
如果配置文件蕴含反复的模式,则能够将它们提取到变量局部。此外,还能够在启动时应用 -v/–variable 标记指定变量,并且任意的零碎环境变量也能够在脚本中应用。
variables:
mongoconnection: mongo --quiet --host=localhost test
barcharts:
- title: MongoDB documents by status
items:
- label: IN_PROGRESS
init: $mongoconnection
sample: db.getCollection('events').find({status:'IN_PROGRESS'}).count()
- label: SUCCESS
init: $mongoconnection
sample: db.getCollection('events').find({status:'SUCCESS'}).count()
- label: FAIL
init: $mongoconnection
sample: db.getCollection('events').find({status:'FAIL'}).count()
色彩主题
theme: light # default = dark
sparklines:
- title: CPU usage
sample: ps -A -o %cpu | awk '{s+=$1} END {print s}'
6、实在场景
数据库
以下是不同的数据库连贯示例。倡议应用交互式 shell(init 脚本)仅建设一次连贯,而后在采样期间重用即可。
MySQL
# prerequisite: installed mysql shell
variables:
mysql_connection: mysql -u root -s --database mysql --skip-column-names
sparklines:
- title: MySQL (random number example)
pty: true
init: $mysql_connection
sample: select rand();
PostgreSQL
# prerequisite: installed psql shell
variables:
PGPASSWORD: pwd
postgres_connection: psql -h localhost -U postgres --no-align --tuples-only
sparklines:
- title: PostgreSQL (random number example)
init: $postgres_connection
sample: select random();
MongoDB
# prerequisite: installed mongo shell
variables:
mongo_connection: mongo --quiet --host=localhost test
sparklines:
- title: MongoDB (random number example)
init: $mongo_connection
sample: Math.random();
Neo4j
# prerequisite: installed cypher shell
variables:
neo4j_connection: cypher-shell -u neo4j -p pwd --format plain
sparklines:
- title: Neo4j (random number example)
pty: true
init: $neo4j_connection
sample: RETURN rand();
transform: echo "$sample" | tail -n 1
Kafka
查看 kafka lag 值,计算每个队列 lag 值的和,高于阈值报警,多 consumergroup,多 topic。
variables:
kafka_connection: $KAFKA_HOME/bin/kafka-consumer-groups --bootstrap-server localhost:9092
runcharts:
- title: Kafka lag per consumer group
rate-ms: 5000
scale: 0
items:
- label: A->B
sample: $kafka_connection --group group_a --describe | awk 'NR>1 {sum += $5} END {print sum}'
- label: B->C
sample: $kafka_connection --group group_b --describe | awk 'NR>1 {sum += $5} END {print sum}'
- label: C->D
sample: $kafka_connection --group group_c --describe | awk 'NR>1 {sum += $5} END {print sum}'
Docker
Docker 容器统计信息(CPU,MEM,O/I)
textboxes:
- title: Docker containers stats
sample: docker stats --no-stream --format "table {{.Name}}t{{.CPUPerc}}t{{.MemPerc}}t{{.MemUsage}}t{{.NetIO}}t{{.BlockIO}}t{{.PIDs}}"
SSH
近程服务器上的 TOP 命令
variables:
sshconnection: ssh -i ~/my-key-pair.pem ec2-user@1.2.3.4
textboxes:
- title: SSH
pty: true
init: $sshconnection
sample: top
JMX
Java 应用程序的失常运行示例
# prerequisite: download [jmxterm jar file](https://docs.cyclopsgroup.org/jmxterm)
textboxes:
- title: Java application uptime
multistep-init:
- java -jar jmxterm-1.0.0-uber.jar
- open host:port # or local PID
- bean java.lang:type=Runtime
sample: get Uptime
transform: echo $sample | tr -dc '0-9' | awk '{printf"%.1f min", $1/1000/60}'
来自:FreeBuf.COM,作者:secist
链接:https://www.freebuf.com/secto…