关于prometheus:Prometheus监控Oracle数据库

86次阅读

共计 5437 个字符,预计需要花费 14 分钟才能阅读完成。

背景

本文简略介绍下,Prometheus 如何通过 exporters 监控 Oracle 数据库,以及应该留神哪些指标。

oracledb_exporter

oracledb_exporter 是一个连贯到 Oracle 数据库并生成 Prometheus metrics 的应用程序,

设置

展现下如何装置和设置 oracledb_exporter,以应用 Prometheus 来监控 Oracle 数据库。oracledb_exporter 部署在 k8s 集群中

在 k8s 应用 Deployment 部署 oracledb_exporter,并增加注解,以实现 Prometheus 主动发现 oracledb_exporter 断点并收集指标

spec:
 template:
   metadata:
     annotations:
       prometheus.io/scrape: "true"
       prometheus.io/port: "9161"
       prometheus.io/path: "/metrics"

oracledb_exporter 须要 Oracle 的连贯信息能力拜访和生成指标,此参数作为环境变量传递到 exporter。因为连贯信息蕴含用于拜访数据库的用户和明码,因而咱们将应用 Kubernetes Secret 来存储它。

要创立到 Oracle 数据库的连贯字符串的明码,能够应用以下命令:

kubectl create secret generic oracledb-exporter-secret \
    --from-literal=datasource='YOUR_CONNECTION_STRING'

在 deployment 中,这样配置环境变量

       env:
       - name: DATA_SOURCE_NAME
         valueFrom:
           secretKeyRef:
             name: oracledb-exporter-secret
             key: datasource

要确保连贯信息是否正确:

system/password@//database_url:1521/database_name.your.domain.com

能够应用 sqlplus docker 镜像进行检测

docker run --net='host' --rm --interactive guywithnose/sqlplus sqlplus system/password@//database_url:1521/database_name.my.domain.com

上面增加一些自定义指标,包含慢查问(slow queries),谬误查问(bug queries)
为了应用自定义指标:

  • 在 deployment 中,咱们将增加另一个环境变量,该变量具备到新指标的文件的路由。
  • 从 ConfigMap 将此新文件挂载为 volume

残缺配置如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: oracledb-exporter
  namespace: database-namespace
spec:
  selector:
    matchLabels:
      app: oracledb-exporter
  replicas: 1
  template:
    metadata:
      labels:
        app: oracledb-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9161"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      - name: oracledb-exporter
        ports:
        - containerPort: 9161
        image: iamseth/oracledb_exporter
        env:
        - name: DATA_SOURCE_NAME
          valueFrom:
            secretKeyRef:
              name: oracledb-exporter-secret
              key: datasource
        - name: CUSTOM_METRICS
          value: /tmp/custom-metrics.toml
        volumeMounts:
          - name:  custom-metrics
            mountPath:  /tmp/custom-metrics.toml
            subPath: custom-metrics.toml
      volumes:
        - name: custom-metrics
          configMap:
            defaultMode: 420
            name: custom-metrics    

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: custom-metrics
  namespace: database-namespace
data:
  custom-metrics.toml: |
    [[metric]]
    context = "slow_queries"
    metricsdesc = {p95_time_usecs= "Gauge metric with percentile 95 of elapsed time.", p99_time_usecs= "Gauge metric with percentile 99 of elapsed time."}
    request = "select  percentile_disc(0.95)  within group (order by elapsed_time) as p95_time_usecs, percentile_disc(0.99)  within group (order by elapsed_time) as p99_time_usecs from v$sql where last_active_time >= sysdate - 5/(24*60)"
    [[metric]]
    context = "big_queries"
    metricsdesc = {p95_rows= "Gauge metric with percentile 95 of returned rows.", p99_rows= "Gauge metric with percentile 99 of returned rows."}
    request = "select  percentile_disc(0.95)  within group (order by rownum) as p95_rows, percentile_disc(0.99)  within group (order by rownum) as p99_rows from v$sql where last_active_time >= sysdate - 5/(24*60)"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_bytes from user_segments where segment_type='TABLE'group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from user_segments where segment_type='TABLE PARTITION'group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_user_segments_top100"
    metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as cluster_bytes from user_segments where segment_type='CLUSTER'group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {table_bytes="Gauge metric with the size of the tables in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_bytes from dba_segments where segment_type='TABLE'group by segment_name) order by table_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {table_partition_bytes="Gauge metric with the size of the table partition in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as table_partition_bytes from dba_segments where segment_type='TABLE PARTITION'group by segment_name) order by table_partition_bytes DESC FETCH NEXT 100 ROWS ONLY"
    [[metric]]
    context = "size_dba_segments_top100"
    metricsdesc = {cluster_bytes="Gauge metric with the size of the cluster in user segments."}
    labels = ["segment_name"]
    request = "select * from (select segment_name,sum(bytes) as cluster_bytes from dba_segments where segment_type='CLUSTER'group by segment_name) order by cluster_bytes DESC FETCH NEXT 100 ROWS ONLY"

创立 Secret 和 ConfigMap 之后,就能够利用 Deployment 并查看它是否正在从 Oracle 数据库的端口 9161 中获取指标。

如果一切正常,Prometheus 将主动发现 exporter 带正文的 pod,并在几分钟内开始抓取指标。能够在 Prometheus Web 界面的 target 局部中对其进行查看,以查找以 oracledb_结尾的任何指标。

监控什么

性能指标

等待时间: exporter 在 Oracle 数据库的不同流动中提供一系列等待时间的指标。它们都以 oracledb_wait_time_前缀结尾,它们有助于评估数据库在哪里破费了更多工夫。它能够存在于 I /O,网络,提交,并发等中。通过这种形式,咱们能够确定零碎中可能影响 Oracle 数据库整体性能的瓶颈。

慢查问:某些查问返回后果所花的工夫可能比其余查问长。如果此工夫高于应用程序中配置的接管响应的超时工夫,它将认为这是来自数据库的超时谬误,而后重试查问。这种行为可能会使零碎超负荷工作,并影响整体性能。

在下面显示的配置中,有两个自定义指标可提供最近 5 分钟内执行查问的响应工夫的百分比 95 和 99 的信息。这些指标是:

  • oracledb_slow_queries_p95_time_usecs
  • oracledb_slow_queries_p99_time_usecs

流动会话:监督 Oracle 数据库中流动会话很重要。如果超过配置的限度,则数据库将回绝新连贯,从而导致应用程序谬误。提供此信息的指标是 oracledb_sessions_value,标签 status 能够提供更多信息。

流动:监督数据库执行的操作也很重要。为此,咱们能够依附以下指标:

  • oracledb_activity_execute_count
  • oracledb_activity_parse_count_total
  • oracledb_activity_user_commits
  • oracledb_activity_user_rollbacks

正文完
 0