文章背景:应用Prometheus+Grafana监控JVM,这片文章中介绍了怎么用jvm-exporter监控咱们的java利用,在咱们的应用场景中须要监控k8s集群中的jvm,接下来谈谈k8s和Prometheus的集成扩大应用,假如咱们曾经胜利将Prometheus部署到咱们的k8s集群中了kubernetes集成prometheus+grafana监控,然而kube-prometheus并没有集成jvm-exporter,这就须要咱们本人操作。
将jvm-exporter整合进咱们的利用
整合过程很简略,只须要将jvm-exporter作为javaagent退出到咱们的java启动命令就能够了,具体见应用Prometheus+Grafana监控JVM
配置Prometheus服务主动发现
对于有Service
裸露的服务咱们能够用 prometheus-operator 我的项目定义的ServiceMonitor
CRD来配置服务发现,配置模板如下:
--- # ServiceMonitor 服务主动发现规定apiVersion: monitoring.coreos.com/v1kind: ServiceMonitor # prometheus-operator 定义的CRDmetadata: name: jmx-metrics namespace: monitoring labels: k8s-apps: jmx-metricsspec: jobLabel: metrics #监控数据的job标签指定为metrics label的值,即加上数据标签job=jmx-metrics selector: matchLabels: metrics: jmx-metrics # 主动发现 label中有metrics: jmx-metrics 的service namespaceSelector: matchNames: # 配置须要主动发现的命名空间,能够配置多个 - my-namespace endpoints: - port: http-metrics # 拉去metric的端口,这个写的是 service的端口名称,即 service yaml的spec.ports.name interval: 15s # 拉取metric的工夫距离--- # 服务service模板apiVersion: v1kind: Servicemetadata: labels: metrics: jmx-metrics # ServiceMonitor 主动发现的要害label name: jmx-metrics namespace: my-namespacespec: ports: - name: http-metrics #对应 ServiceMonitor 中spec.endpoints.port port: 9093 # jmx-exporter 裸露的服务端口 targetPort: http-metrics # pod yaml 裸露的端口名 selector: metrics: jmx-metrics # service自身的标签选择器
以上配置了my-namespace
命名空间的 jmx-metrics Service的服务主动发现,Prometheus会将这个service 的所有关联pod主动退出监控,并从apiserver获取到最新的pod列表,这样当咱们的服务正本裁减时也能主动增加到监控零碎中。
那么对于没有创立 Service
的服务,比方以HostPort对集群外裸露服务的实例,咱们能够应用 PodMonitor
来做服务发现,相干样例如下:
--- # PodMonitor 服务主动发现规定,最新的版本反对,旧版本可能不反对apiVersion: monitoring.coreos.com/v1kind: PodMonitor # prometheus-operator 定义的CRDmetadata: name: jmx-metrics namespace: monitoring labels: k8s-apps: jmx-metricsspec: jobLabel: metrics #监控数据的job标签指定为metrics label的值,即加上数据标签job=jmx-metrics selector: matchLabels: metrics: jmx-metrics # 主动发现 label中有metrics: jmx-metrics 的pod namespaceSelector: matchNames: # 配置须要主动发现的命名空间,能够配置多个 - my-namespace podMetricsEndpoints: - port: http-metrics # Pod yaml中 metric裸露端口的名称 即 spec.ports.name interval: 15s # 拉取metric的工夫距离--- # 须要监控的Pod模板apiVersion: v1kind: Podmetadata: labels: metrics: jmx-metrics name: jmx-metrics namespace: my-namespacespec: containers: - image: tomcat:9.0 name: tomcat ports: - containerPort: 9093 name: http-metrics
- 为Prometheus serviceAccount 增加对应namespace的权限
--- # 在对应的ns中创立角色apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: prometheus-k8s namespace: my-namespacerules:- apiGroups: - "" resources: - services - endpoints - pods verbs: - get - list - watch--- # 绑定角色 prometheus-k8s 角色到 RoleapiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: prometheus-k8s namespace: my-namespaceroleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: prometheus-k8ssubjects:- kind: ServiceAccount name: prometheus-k8s # Prometheus 容器应用的 serviceAccount,kube-prometheus默认应用prometheus-k8s这个用户 namespace: monitoring
在Prometheus治理页面中查看服务发现
服务发现配置胜利后会呈现在Prometheus的治理界面中:
增加报警规定
新建报警规定文件:jvm-alert-rules.yaml,填入以下内容
apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata: labels: prometheus: k8s role: alert-rules name: jvm-metrics-rules namespace: monitoringspec: groups: - name: jvm-metrics-rules rules: # 在5分钟里,GC破费工夫超过10% - alert: GcTimeTooMuch expr: increase(jvm_gc_collection_seconds_sum[5m]) > 30 for: 5m labels: severity: red annotations: summary: "{{ $labels.app }} GC工夫占比超过10%" message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} GC工夫占比超过10%,以后值({{ $value }}%)" # GC次数太多 - alert: GcCountTooMuch expr: increase(jvm_gc_collection_seconds_count[1m]) > 30 for: 1m labels: severity: red annotations: summary: "{{ $labels.app }} 1分钟GC次数>30次" message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1分钟GC次数>30次,以后值({{ $value }})" # FGC次数太多 - alert: FgcCountTooMuch expr: increase(jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep"}[1h]) > 3 for: 1m labels: severity: red annotations: summary: "{{ $labels.app }} 1小时的FGC次数>3次" message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1小时的FGC次数>3次,以后值({{ $value }})" # 非堆内存应用超过80% - alert: NonheapUsageTooMuch expr: jvm_memory_bytes_used{job="jmx-metrics", area="nonheap"} / jvm_memory_bytes_max * 100 > 80 for: 1m labels: severity: red annotations: summary: "{{ $labels.app }} 非堆内存应用>80%" message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 非堆内存使用率>80%,以后值({{ $value }}%)" # 内存应用预警 - alert: HeighMemUsage expr: process_resident_memory_bytes{job="jmx-metrics"} / os_total_physical_memory_bytes * 100 > 85 for: 1m labels: severity: red annotations: summary: "{{ $labels.app }} rss内存使用率大于85%" message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} rss内存使用率大于85%,以后值({{ $value }}%)"
执行
kubectl apply -f jvm-alert-rules.yaml
使规定失效增加报警接管人
编辑承受人配置:
global: resolve_timeout: 5mroute: group_by: ['job', 'alertname', 'pod'] group_interval: 2m receiver: my-alert-receiver routes: - match: job: jmx-metrics receiver: my-alert-receiver repeat_interval: 3hreceivers:- name: my-alert-receiver webhook_configs: - url: http://mywebhook.com/ max_alerts: 1 send_resolved: true
应用工具转换为base64编码,填入alert-manager对应的配置Secret中kubectl edit -n monitoring Secret alertmanager-main
apiVersion: v1data: alertmanager.yaml: KICAgICJyZWNlaXZlciI6ICJudWxsIg== # base64填入这里kind: Secretmetadata: name: alertmanager-main namespace: monitoringtype: Opaque
退出编辑后稍等一会儿失效。
自此,jvm监控系统配置实现。
附jvm-exporter接口返回参数示例,能够依据须要自取其中的metric
# HELP jvm_threads_current Current thread count of a JVM# TYPE jvm_threads_current gaugejvm_threads_current 218.0# HELP jvm_threads_daemon Daemon thread count of a JVM# TYPE jvm_threads_daemon gaugejvm_threads_daemon 40.0# HELP jvm_threads_peak Peak thread count of a JVM# TYPE jvm_threads_peak gaugejvm_threads_peak 219.0# HELP jvm_threads_started_total Started thread count of a JVM# TYPE jvm_threads_started_total counterjvm_threads_started_total 249.0# HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers# TYPE jvm_threads_deadlocked gaugejvm_threads_deadlocked 0.0# HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors# TYPE jvm_threads_deadlocked_monitor gaugejvm_threads_deadlocked_monitor 0.0# HELP jvm_threads_state Current count of threads by state# TYPE jvm_threads_state gaugejvm_threads_state{state="NEW",} 0.0jvm_threads_state{state="RUNNABLE",} 49.0jvm_threads_state{state="TIMED_WAITING",} 141.0jvm_threads_state{state="TERMINATED",} 0.0jvm_threads_state{state="WAITING",} 28.0jvm_threads_state{state="BLOCKED",} 0.0# HELP jvm_info JVM version info# TYPE jvm_info gaugejvm_info{version="1.8.0_261-b12",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.# TYPE jvm_memory_bytes_used gaugejvm_memory_bytes_used{area="heap",} 1.553562144E9jvm_memory_bytes_used{area="nonheap",} 6.5181496E7# HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.# TYPE jvm_memory_bytes_committed gaugejvm_memory_bytes_committed{area="heap",} 4.08027136E9jvm_memory_bytes_committed{area="nonheap",} 6.8747264E7# HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.# TYPE jvm_memory_bytes_max gaugejvm_memory_bytes_max{area="heap",} 4.08027136E9jvm_memory_bytes_max{area="nonheap",} 1.317011456E9# HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.# TYPE jvm_memory_bytes_init gaugejvm_memory_bytes_init{area="heap",} 4.294967296E9jvm_memory_bytes_init{area="nonheap",} 2555904.0# HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.# TYPE jvm_memory_pool_bytes_used gaugejvm_memory_pool_bytes_used{pool="Code Cache",} 2.096832E7jvm_memory_pool_bytes_used{pool="Metaspace",} 3.9320064E7jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 4893112.0jvm_memory_pool_bytes_used{pool="Par Eden Space",} 1.71496168E8jvm_memory_pool_bytes_used{pool="Par Survivor Space",} 7.1602832E7jvm_memory_pool_bytes_used{pool="CMS Old Gen",} 1.310463144E9# HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.# TYPE jvm_memory_pool_bytes_committed gaugejvm_memory_pool_bytes_committed{pool="Code Cache",} 2.3396352E7jvm_memory_pool_bytes_committed{pool="Metaspace",} 4.0239104E7jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 5111808.0jvm_memory_pool_bytes_committed{pool="Par Eden Space",} 1.718091776E9jvm_memory_pool_bytes_committed{pool="Par Survivor Space",} 2.14695936E8jvm_memory_pool_bytes_committed{pool="CMS Old Gen",} 2.147483648E9# HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.# TYPE jvm_memory_pool_bytes_max gaugejvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8jvm_memory_pool_bytes_max{pool="Metaspace",} 5.36870912E8jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 5.28482304E8jvm_memory_pool_bytes_max{pool="Par Eden Space",} 1.718091776E9jvm_memory_pool_bytes_max{pool="Par Survivor Space",} 2.14695936E8jvm_memory_pool_bytes_max{pool="CMS Old Gen",} 2.147483648E9# HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.# TYPE jvm_memory_pool_bytes_init gaugejvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0jvm_memory_pool_bytes_init{pool="Par Eden Space",} 1.718091776E9jvm_memory_pool_bytes_init{pool="Par Survivor Space",} 2.14695936E8jvm_memory_pool_bytes_init{pool="CMS Old Gen",} 2.147483648E9# HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.# TYPE jmx_config_reload_failure_total counterjmx_config_reload_failure_total 0.0# HELP os_free_physical_memory_bytes FreePhysicalMemorySize (java.lang<type=OperatingSystem><>FreePhysicalMemorySize)# TYPE os_free_physical_memory_bytes gaugeos_free_physical_memory_bytes 9.1234304E8# HELP os_committed_virtual_memory_bytes CommittedVirtualMemorySize (java.lang<type=OperatingSystem><>CommittedVirtualMemorySize)# TYPE os_committed_virtual_memory_bytes gaugeos_committed_virtual_memory_bytes 2.2226296832E10# HELP os_total_swap_space_bytes TotalSwapSpaceSize (java.lang<type=OperatingSystem><>TotalSwapSpaceSize)# TYPE os_total_swap_space_bytes gaugeos_total_swap_space_bytes 0.0# HELP os_max_file_descriptor_count MaxFileDescriptorCount (java.lang<type=OperatingSystem><>MaxFileDescriptorCount)# TYPE os_max_file_descriptor_count gaugeos_max_file_descriptor_count 1048576.0# HELP os_system_load_average SystemLoadAverage (java.lang<type=OperatingSystem><>SystemLoadAverage)# TYPE os_system_load_average gaugeos_system_load_average 4.97# HELP os_total_physical_memory_bytes TotalPhysicalMemorySize (java.lang<type=OperatingSystem><>TotalPhysicalMemorySize)# TYPE os_total_physical_memory_bytes gaugeos_total_physical_memory_bytes 1.073741824E10# HELP os_system_cpu_load SystemCpuLoad (java.lang<type=OperatingSystem><>SystemCpuLoad)# TYPE os_system_cpu_load gaugeos_system_cpu_load 1.0# HELP os_free_swap_space_bytes FreeSwapSpaceSize (java.lang<type=OperatingSystem><>FreeSwapSpaceSize)# TYPE os_free_swap_space_bytes gaugeos_free_swap_space_bytes 0.0# HELP os_available_processors AvailableProcessors (java.lang<type=OperatingSystem><>AvailableProcessors)# TYPE os_available_processors gaugeos_available_processors 6.0# HELP os_process_cpu_load ProcessCpuLoad (java.lang<type=OperatingSystem><>ProcessCpuLoad)# TYPE os_process_cpu_load gaugeos_process_cpu_load 0.14194299011052938# HELP os_open_file_descriptor_count OpenFileDescriptorCount (java.lang<type=OperatingSystem><>OpenFileDescriptorCount)# TYPE os_open_file_descriptor_count gaugeos_open_file_descriptor_count 717.0# HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.# TYPE jmx_scrape_duration_seconds gaugejmx_scrape_duration_seconds 0.004494197# HELP jmx_scrape_error Non-zero if this scrape failed.# TYPE jmx_scrape_error gaugejmx_scrape_error 0.0# HELP jmx_scrape_cached_beans Number of beans with their matching rule cached# TYPE jmx_scrape_cached_beans gaugejmx_scrape_cached_beans 0.0# HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.# TYPE jvm_buffer_pool_used_bytes gaugejvm_buffer_pool_used_bytes{pool="direct",} 2.3358974E7jvm_buffer_pool_used_bytes{pool="mapped",} 0.0# HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.# TYPE jvm_buffer_pool_capacity_bytes gaugejvm_buffer_pool_capacity_bytes{pool="direct",} 2.3358974E7jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0# HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.# TYPE jvm_buffer_pool_used_buffers gaugejvm_buffer_pool_used_buffers{pool="direct",} 61.0jvm_buffer_pool_used_buffers{pool="mapped",} 0.0# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.# TYPE jvm_gc_collection_seconds summaryjvm_gc_collection_seconds_count{gc="ParNew",} 77259.0jvm_gc_collection_seconds_sum{gc="ParNew",} 2399.831jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep",} 1.0jvm_gc_collection_seconds_sum{gc="ConcurrentMarkSweep",} 0.29# HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.# TYPE jmx_config_reload_success_total counterjmx_config_reload_success_total 0.0# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.# TYPE process_cpu_seconds_total counterprocess_cpu_seconds_total 1759604.89# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.# TYPE process_start_time_seconds gaugeprocess_start_time_seconds 1.608630226597E9# HELP process_open_fds Number of open file descriptors.# TYPE process_open_fds gaugeprocess_open_fds 717.0# HELP process_max_fds Maximum number of open file descriptors.# TYPE process_max_fds gaugeprocess_max_fds 1048576.0# HELP process_virtual_memory_bytes Virtual memory size in bytes.# TYPE process_virtual_memory_bytes gaugeprocess_virtual_memory_bytes 2.2226292736E10# HELP process_resident_memory_bytes Resident memory size in bytes.# TYPE process_resident_memory_bytes gaugeprocess_resident_memory_bytes 4.644765696E9# HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.# TYPE jmx_exporter_build_info gaugejmx_exporter_build_info{version="0.14.0",name="jmx_prometheus_javaagent",} 1.0# HELP jvm_memory_pool_allocated_bytes_total Total bytes allocated in a given JVM memory pool. Only updated after GC, not continuously.# TYPE jvm_memory_pool_allocated_bytes_total counterjvm_memory_pool_allocated_bytes_total{pool="Par Survivor Space",} 1.42928399936E11jvm_memory_pool_allocated_bytes_total{pool="CMS Old Gen",} 2.862731656E9jvm_memory_pool_allocated_bytes_total{pool="Code Cache",} 2.8398656E7jvm_memory_pool_allocated_bytes_total{pool="Compressed Class Space",} 4912848.0jvm_memory_pool_allocated_bytes_total{pool="Metaspace",} 3.9438872E7jvm_memory_pool_allocated_bytes_total{pool="Par Eden Space",} 1.32737951722432E14# HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM# TYPE jvm_classes_loaded gaugejvm_classes_loaded 7282.0# HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution# TYPE jvm_classes_loaded_total counterjvm_classes_loaded_total 7317.0# HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution# TYPE jvm_classes_unloaded_total counterjvm_classes_unloaded_total 35.0