文章背景:应用 Prometheus+Grafana 监控 JVM,这片文章中介绍了怎么用 jvm-exporter 监控咱们的 java 利用,在咱们的应用场景中须要监控 k8s 集群中的 jvm,接下来谈谈 k8s 和 Prometheus 的集成扩大应用,假如咱们曾经胜利将 Prometheus 部署到咱们的 k8s 集群中了 kubernetes 集成 prometheus+grafana 监控,然而 kube-prometheus 并没有集成 jvm-exporter,这就须要咱们本人操作。
-
将 jvm-exporter 整合进咱们的利用
整合过程很简略,只须要将 jvm-exporter 作为 javaagent 退出到咱们的 java 启动命令就能够了,具体见应用 Prometheus+Grafana 监控 JVM
-
配置 Prometheus 服务主动发现
对于有 Service
裸露的服务咱们能够用 prometheus-operator 我的项目定义的ServiceMonitor
CRD 来配置服务发现,配置模板如下:
--- # ServiceMonitor 服务主动发现规定
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor # prometheus-operator 定义的 CRD
metadata:
name: jmx-metrics
namespace: monitoring
labels:
k8s-apps: jmx-metrics
spec:
jobLabel: metrics #监控数据的 job 标签指定为 metrics label 的值,即加上数据标签 job=jmx-metrics
selector:
matchLabels:
metrics: jmx-metrics # 主动发现 label 中有 metrics: jmx-metrics 的 service
namespaceSelector:
matchNames: # 配置须要主动发现的命名空间,能够配置多个
- my-namespace
endpoints:
- port: http-metrics # 拉去 metric 的端口,这个写的是 service 的端口名称,即 service yaml 的 spec.ports.name
interval: 15s # 拉取 metric 的工夫距离
--- # 服务 service 模板
apiVersion: v1
kind: Service
metadata:
labels:
metrics: jmx-metrics # ServiceMonitor 主动发现的要害 label
name: jmx-metrics
namespace: my-namespace
spec:
ports:
- name: http-metrics #对应 ServiceMonitor 中 spec.endpoints.port
port: 9093 # jmx-exporter 裸露的服务端口
targetPort: http-metrics # pod yaml 裸露的端口名
selector:
metrics: jmx-metrics # service 自身的标签选择器
以上配置了 my-namespace
命名空间的 jmx-metrics Service 的服务主动发现,Prometheus 会将这个 service 的所有关联 pod 主动退出监控,并从 apiserver 获取到最新的 pod 列表,这样当咱们的服务正本裁减时也能主动增加到监控零碎中。
那么对于没有创立 Service
的服务,比方以 HostPort 对集群外裸露服务的实例,咱们能够应用 PodMonitor
来做服务发现,相干样例如下:
--- # PodMonitor 服务主动发现规定,最新的版本反对,旧版本可能不反对
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor # prometheus-operator 定义的 CRD
metadata:
name: jmx-metrics
namespace: monitoring
labels:
k8s-apps: jmx-metrics
spec:
jobLabel: metrics #监控数据的 job 标签指定为 metrics label 的值,即加上数据标签 job=jmx-metrics
selector:
matchLabels:
metrics: jmx-metrics # 主动发现 label 中有 metrics: jmx-metrics 的 pod
namespaceSelector:
matchNames: # 配置须要主动发现的命名空间,能够配置多个
- my-namespace
podMetricsEndpoints:
- port: http-metrics # Pod yaml 中 metric 裸露端口的名称 即 spec.ports.name
interval: 15s # 拉取 metric 的工夫距离
--- # 须要监控的 Pod 模板
apiVersion: v1
kind: Pod
metadata:
labels:
metrics: jmx-metrics
name: jmx-metrics
namespace: my-namespace
spec:
containers:
- image: tomcat:9.0
name: tomcat
ports:
- containerPort: 9093
name: http-metrics
- 为 Prometheus serviceAccount 增加对应 namespace 的权限
--- # 在对应的 ns 中创立角色
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: my-namespace
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
--- # 绑定角色 prometheus-k8s 角色到 Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: my-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s # Prometheus 容器应用的 serviceAccount,kube-prometheus 默认应用 prometheus-k8s 这个用户
namespace: monitoring
-
在 Prometheus 治理页面中查看服务发现
服务发现配置胜利后会呈现在 Prometheus 的治理界面中:
-
增加报警规定
新建报警规定文件:jvm-alert-rules.yaml,填入以下内容
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: k8s role: alert-rules name: jvm-metrics-rules namespace: monitoring spec: groups: - name: jvm-metrics-rules rules: # 在 5 分钟里,GC 破费工夫超过 10% - alert: GcTimeTooMuch expr: increase(jvm_gc_collection_seconds_sum[5m]) > 30 for: 5m labels: severity: red annotations: summary: "{{$labels.app}} GC 工夫占比超过 10%" message: "ns:{{$labels.namespace}} pod:{{$labels.pod}} GC 工夫占比超过 10%,以后值({{$value}}%)" # GC 次数太多 - alert: GcCountTooMuch expr: increase(jvm_gc_collection_seconds_count[1m]) > 30 for: 1m labels: severity: red annotations: summary: "{{$labels.app}} 1 分钟 GC 次数 >30 次" message: "ns:{{$labels.namespace}} pod:{{$labels.pod}} 1 分钟 GC 次数 >30 次,以后值({{$value}})" # FGC 次数太多 - alert: FgcCountTooMuch expr: increase(jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep"}[1h]) > 3 for: 1m labels: severity: red annotations: summary: "{{$labels.app}} 1 小时的 FGC 次数 >3 次" message: "ns:{{$labels.namespace}} pod:{{$labels.pod}} 1 小时的 FGC 次数 >3 次,以后值({{$value}})" # 非堆内存应用超过 80% - alert: NonheapUsageTooMuch expr: jvm_memory_bytes_used{job="jmx-metrics", area="nonheap"} / jvm_memory_bytes_max * 100 > 80 for: 1m labels: severity: red annotations: summary: "{{$labels.app}} 非堆内存应用 >80%" message: "ns:{{$labels.namespace}} pod:{{$labels.pod}} 非堆内存使用率 >80%,以后值({{$value}}%)" # 内存应用预警 - alert: HeighMemUsage expr: process_resident_memory_bytes{job="jmx-metrics"} / os_total_physical_memory_bytes * 100 > 85 for: 1m labels: severity: red annotations: summary: "{{$labels.app}} rss 内存使用率大于 85%" message: "ns:{{$labels.namespace}} pod:{{$labels.pod}} rss 内存使用率大于 85%,以后值({{$value}}%)"
执行
kubectl apply -f jvm-alert-rules.yaml
使规定失效 -
增加报警接管人
编辑承受人配置:
global:
resolve_timeout: 5m
route:
group_by: ['job', 'alertname', 'pod']
group_interval: 2m
receiver: my-alert-receiver
routes:
- match:
job: jmx-metrics
receiver: my-alert-receiver
repeat_interval: 3h
receivers:
- name: my-alert-receiver
webhook_configs:
- url: http://mywebhook.com/
max_alerts: 1
send_resolved: true
应用工具转换为 base64 编码,填入 alert-manager 对应的配置 Secret 中kubectl edit -n monitoring Secret alertmanager-main
apiVersion: v1
data:
alertmanager.yaml: KICAgICJyZWNlaXZlciI6ICJudWxsIg== # base64 填入这里
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
type: Opaque
退出编辑后稍等一会儿失效。
自此,jvm 监控系统配置实现。
附 jvm-exporter 接口返回参数示例,能够依据须要自取其中的 metric
# HELP jvm_threads_current Current thread count of a JVM
# TYPE jvm_threads_current gauge
jvm_threads_current 218.0
# HELP jvm_threads_daemon Daemon thread count of a JVM
# TYPE jvm_threads_daemon gauge
jvm_threads_daemon 40.0
# HELP jvm_threads_peak Peak thread count of a JVM
# TYPE jvm_threads_peak gauge
jvm_threads_peak 219.0
# HELP jvm_threads_started_total Started thread count of a JVM
# TYPE jvm_threads_started_total counter
jvm_threads_started_total 249.0
# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
# TYPE jvm_threads_deadlocked gauge
jvm_threads_deadlocked 0.0
# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
# TYPE jvm_threads_deadlocked_monitor gauge
jvm_threads_deadlocked_monitor 0.0
# HELP jvm_threads_state Current count of threads by state
# TYPE jvm_threads_state gauge
jvm_threads_state{state="NEW",} 0.0
jvm_threads_state{state="RUNNABLE",} 49.0
jvm_threads_state{state="TIMED_WAITING",} 141.0
jvm_threads_state{state="TERMINATED",} 0.0
jvm_threads_state{state="WAITING",} 28.0
jvm_threads_state{state="BLOCKED",} 0.0
# HELP jvm_info JVM version info
# TYPE jvm_info gauge
jvm_info{version="1.8.0_261-b12",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0
# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 1.553562144E9
jvm_memory_bytes_used{area="nonheap",} 6.5181496E7
# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_committed gauge
jvm_memory_bytes_committed{area="heap",} 4.08027136E9
jvm_memory_bytes_committed{area="nonheap",} 6.8747264E7
# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
# TYPE jvm_memory_bytes_max gauge
jvm_memory_bytes_max{area="heap",} 4.08027136E9
jvm_memory_bytes_max{area="nonheap",} 1.317011456E9
# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_init gauge
jvm_memory_bytes_init{area="heap",} 4.294967296E9
jvm_memory_bytes_init{area="nonheap",} 2555904.0
# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_used gauge
jvm_memory_pool_bytes_used{pool="Code Cache",} 2.096832E7
jvm_memory_pool_bytes_used{pool="Metaspace",} 3.9320064E7
jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 4893112.0
jvm_memory_pool_bytes_used{pool="Par Eden Space",} 1.71496168E8
jvm_memory_pool_bytes_used{pool="Par Survivor Space",} 7.1602832E7
jvm_memory_pool_bytes_used{pool="CMS Old Gen",} 1.310463144E9
# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_committed gauge
jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.3396352E7
jvm_memory_pool_bytes_committed{pool="Metaspace",} 4.0239104E7
jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 5111808.0
jvm_memory_pool_bytes_committed{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_committed{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_committed{pool="CMS Old Gen",} 2.147483648E9
# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_max gauge
jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
jvm_memory_pool_bytes_max{pool="Metaspace",} 5.36870912E8
jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 5.28482304E8
jvm_memory_pool_bytes_max{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_max{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_max{pool="CMS Old Gen",} 2.147483648E9
# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
# TYPE jvm_memory_pool_bytes_init gauge
jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
jvm_memory_pool_bytes_init{pool="Par Eden Space",} 1.718091776E9
jvm_memory_pool_bytes_init{pool="Par Survivor Space",} 2.14695936E8
jvm_memory_pool_bytes_init{pool="CMS Old Gen",} 2.147483648E9
# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
# TYPE jmx_config_reload_failure_total counter
jmx_config_reload_failure_total 0.0
# HELP os_free_physical_memory_bytes FreePhysicalMemorySize (java.lang<type=OperatingSystem><>FreePhysicalMemorySize)
# TYPE os_free_physical_memory_bytes gauge
os_free_physical_memory_bytes 9.1234304E8
# HELP os_committed_virtual_memory_bytes CommittedVirtualMemorySize (java.lang<type=OperatingSystem><>CommittedVirtualMemorySize)
# TYPE os_committed_virtual_memory_bytes gauge
os_committed_virtual_memory_bytes 2.2226296832E10
# HELP os_total_swap_space_bytes TotalSwapSpaceSize (java.lang<type=OperatingSystem><>TotalSwapSpaceSize)
# TYPE os_total_swap_space_bytes gauge
os_total_swap_space_bytes 0.0
# HELP os_max_file_descriptor_count MaxFileDescriptorCount (java.lang<type=OperatingSystem><>MaxFileDescriptorCount)
# TYPE os_max_file_descriptor_count gauge
os_max_file_descriptor_count 1048576.0
# HELP os_system_load_average SystemLoadAverage (java.lang<type=OperatingSystem><>SystemLoadAverage)
# TYPE os_system_load_average gauge
os_system_load_average 4.97
# HELP os_total_physical_memory_bytes TotalPhysicalMemorySize (java.lang<type=OperatingSystem><>TotalPhysicalMemorySize)
# TYPE os_total_physical_memory_bytes gauge
os_total_physical_memory_bytes 1.073741824E10
# HELP os_system_cpu_load SystemCpuLoad (java.lang<type=OperatingSystem><>SystemCpuLoad)
# TYPE os_system_cpu_load gauge
os_system_cpu_load 1.0
# HELP os_free_swap_space_bytes FreeSwapSpaceSize (java.lang<type=OperatingSystem><>FreeSwapSpaceSize)
# TYPE os_free_swap_space_bytes gauge
os_free_swap_space_bytes 0.0
# HELP os_available_processors AvailableProcessors (java.lang<type=OperatingSystem><>AvailableProcessors)
# TYPE os_available_processors gauge
os_available_processors 6.0
# HELP os_process_cpu_load ProcessCpuLoad (java.lang<type=OperatingSystem><>ProcessCpuLoad)
# TYPE os_process_cpu_load gauge
os_process_cpu_load 0.14194299011052938
# HELP os_open_file_descriptor_count OpenFileDescriptorCount (java.lang<type=OperatingSystem><>OpenFileDescriptorCount)
# TYPE os_open_file_descriptor_count gauge
os_open_file_descriptor_count 717.0
# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
# TYPE jmx_scrape_duration_seconds gauge
jmx_scrape_duration_seconds 0.004494197
# HELP jmx_scrape_error Non-zero if this scrape failed.
# TYPE jmx_scrape_error gauge
jmx_scrape_error 0.0
# HELP jmx_scrape_cached_beans Number of beans with their matching rule cached
# TYPE jmx_scrape_cached_beans gauge
jmx_scrape_cached_beans 0.0
# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool="direct",} 2.3358974E7
jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
# TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool="direct",} 2.3358974E7
jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
# TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool="direct",} 61.0
jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="ParNew",} 77259.0
jvm_gc_collection_seconds_sum{gc="ParNew",} 2399.831
jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep",} 1.0
jvm_gc_collection_seconds_sum{gc="ConcurrentMarkSweep",} 0.29
# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
# TYPE jmx_config_reload_success_total counter
jmx_config_reload_success_total 0.0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1759604.89
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.608630226597E9
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 717.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.2226292736E10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.644765696E9
# HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.
# TYPE jmx_exporter_build_info gauge
jmx_exporter_build_info{version="0.14.0",name="jmx_prometheus_javaagent",} 1.0
# HELP jvm_memory_pool_allocated_bytes_total Total bytes allocated in a given JVM memory pool. Only updated after GC, not continuously.
# TYPE jvm_memory_pool_allocated_bytes_total counter
jvm_memory_pool_allocated_bytes_total{pool="Par Survivor Space",} 1.42928399936E11
jvm_memory_pool_allocated_bytes_total{pool="CMS Old Gen",} 2.862731656E9
jvm_memory_pool_allocated_bytes_total{pool="Code Cache",} 2.8398656E7
jvm_memory_pool_allocated_bytes_total{pool="Compressed Class Space",} 4912848.0
jvm_memory_pool_allocated_bytes_total{pool="Metaspace",} 3.9438872E7
jvm_memory_pool_allocated_bytes_total{pool="Par Eden Space",} 1.32737951722432E14
# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
# TYPE jvm_classes_loaded gauge
jvm_classes_loaded 7282.0
# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
# TYPE jvm_classes_loaded_total counter
jvm_classes_loaded_total 7317.0
# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
# TYPE jvm_classes_unloaded_total counter
jvm_classes_unloaded_total 35.0