关于文件系统:GrafanaPrometheus-搭建-JuiceFS-可视化监控系统

作为承载海量数据存储的分布式文件系统，用户通常须要直观地理解整个零碎的容量、文件数量、CPU 负载、磁盘 IO、缓存等指标的变动。

JuiceFS 没有反复造轮子，而是通过 Prometheus 兼容的 API 对外提供实时的状态数据，只需将其增加到用户自建的 Prometheus Server 建设时序数据，而后通过 Grafana 等工具即可轻松实现 JucieFS 文件系统的可视化监控。

疾速上手

这里假如你搭建的 Prometheus Server、Grafana 与 JuiceFS 客户端都运行在雷同的主机上。其中：

Prometheus Server：用于收集并保留各种指标的时序数据，装置办法请参考官网文档。
Grafana：用于从 Prometheus 读取并可视化展示时序数据，装置办法请参考官网文档。

Ⅰ. 取得实时数据

JuiceFS 通过 Prometheus 类型的 API 对外提供数据。文件系统挂载后，默认能够通过 http://localhost:9567/metrics 地址取得客户端输入的实时监控数据。

Ⅱ. 增加 API 到 Prometheus Server

编辑 Prometheus 的配置文件，增加一个新 job 并指向 JuiceFS 的 API 地址，例如：

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "juicefs"
    static_configs:
      - targets: ["localhost:9567"]

假如配置文件名为 prometheus.yml，加载该配置启动服务：

./prometheus --config.file=prometheus.yml

拜访 http://localhost:9090 即可看到 Prometheus 的界面。

Ⅲ. 通过 Grafana 展示 Prometheus 的数据

如下图所示，新建 Data Source：

Name: 为了便于辨认，能够填写文件系统的名称。
URL: Prometheus 的数据接口，默认为 http://localhost:9090

而后，应用 grafana_template.json 创立一个仪表盘。进入新建的仪表盘即可看到文件系统的可视化图表了：

收集监控指标

依据部署 JuiceFS 的形式不同能够有不同的收集监控指标的办法，上面别离介绍。

挂载点

当通过 juicefs mount 命令挂载 JuiceFS 文件系统后，能够通过 http://localhost:9567/metrics 这个地址收集监控指标，你也能够通过 --metrics 选项自定义。如：

$ juicefs mount --metrics localhost:9567 ...

你能够应用命令行工具查看这些监控指标：

$ curl http://localhost:9567/metrics

除此之外，每个 JuiceFS 文件系统的根目录还有一个叫做 .stats 的暗藏文件，通过这个文件也能够查看监控指标。例如（这里假如挂载点的门路是 /jfs）：

$ cat /jfs/.stats

Kubernetes

JuiceFS CSI 驱动默认会在 mount pod 的 9567 端口提供监控指标，也能够通过在 mountOptions 中增加 metrics 选项自定义（对于如何批改 mountOptions 请参考 CSI 驱动文档），如：

apiVersion: v1
kind: PersistentVolume
metadata:
  name: juicefs-pv
  labels:
    juicefs-name: ten-pb-fs
spec:
  ...
  mountOptions:
    - metrics=0.0.0.0:9567

新增一个抓取工作到 prometheus.yml 来收集监控指标：

scrape_configs:
  - job_name: 'juicefs'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
      action: keep
      regex: juicefs-mount
    - source_labels: [__address__]
      action: replace
      regex: ([^:]+)(:\d+)?
      replacement: $1:9567
      target_label: __address__
    - source_labels: [__meta_kubernetes_pod_node_name]
      target_label: node
      action: replace

这里假如 Prometheus 服务运行在 Kubernetes 集群中，如果你的 Prometheus 服务运行在 Kubernetes 集群之外，请确保 Prometheus 服务能够拜访 Kubernetes 节点，请参考这个 issue 增加 api_server 和 tls_config 配置到以上文件：

scrape_configs:
  - job_name: 'juicefs'
    kubernetes_sd_configs:
    - api_server: <Kubernetes API Server>
      role: pod
      tls_config:
        ca_file: <...>
        cert_file: <...>
        key_file: <...>
        insecure_skip_verify: false
    relabel_configs:
    ...

S3 网关

JuiceFS S3 网关默认会在 http://localhost:9567/metrics 这个地址提供监控指标，你也能够通过 --metrics 选项自定义。如：

$ juicefs gateway --metrics localhost:9567 ...

如果你是在 Kubernetes 中部署 JuiceFS S3 网关，能够参考 Kubernetes 大节的 Prometheus 配置来收集监控指标（区别次要在于 __meta_kubernetes_pod_label_app_kubernetes_io_name 这个标签的正则表达式），例如：

scrape_configs:
  - job_name: 'juicefs-s3-gateway'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
        action: keep
        regex: juicefs-s3-gateway
      - source_labels: [__address__]
        action: replace
        regex: ([^:]+)(:\d+)?
        replacement: $1:9567
        target_label: __address__
      - source_labels: [__meta_kubernetes_pod_node_name]
        target_label: node
        action: replace

通过 Prometheus Operator 收集

Prometheus Operator 让用户在 Kubernetes 环境中可能疾速部署和治理 Prometheus，借助 Prometheus Operator 提供的 ServiceMonitor CRD 能够主动生成抓取配置。例如（假如 JuiceFS S3 网关的 Service 部署在 kube-system 名字空间）：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: juicefs-s3-gateway
spec:
  namespaceSelector:
    matchNames:
      - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: juicefs-s3-gateway
  endpoints:
    - port: metrics

Hadoop

JuiceFS Hadoop Java SDK 反对把监控指标上报到 Pushgateway 或者 Graphite。

Pushgateway

启用指标上报到 Pushgateway：

<property>
  <name>juicefs.push-gateway</name>
  <value>host:port</value>
</property>

同时能够通过 juicefs.push-interval 配置批改上报指标的频率，默认为 10 秒上报一次。

依据 Pushgateway 官网文档的倡议，Prometheus 的抓取配置中须要设置 honor_labels: true。

须要特地留神，Prometheus 从 Pushgateway 抓取的指标的工夫戳不是 JuiceFS Hadoop Java SDK 上报时的工夫，而是抓取时的工夫，具体请参考 Pushgateway 官网文档。

默认状况下 Pushgateway 只会在内存中保留指标，如果须要长久化到磁盘上，能够通过 --persistence.file 选项指定保留的文件门路以及 --persistence.interval 选项指定保留到文件的频率（默认 5 分钟保留一次）。

每一个应用 JuiceFS Hadoop Java SDK 的过程会有惟一的指标，而 Pushgateway 会始终记住所有收集到的指标，导致指标数继续积攒占用过多内存，也会使得 Prometheus 抓取指标时变慢，倡议定期清理 Pushgateway 上的指标。

定期应用上面的命令清理 Pushgateway 的指标数据，清空指标不影响运行中的 JuiceFS Hadoop Java SDK 继续上报数据。留神 Pushgateway 启动时必须指定 --web.enable-admin-api 选项，同时以下命令会清空 Pushgateway 中的所有监控指标。

$ curl -X PUT http://host:9091/api/v1/admin/wipe

Graphite

启用指标上报到 Graphite：

<property>
  <name>juicefs.push-graphite</name>
  <value>host:port</value>
</property>

同时能够通过 juicefs.push-interval 配置批改上报指标的频率，默认为 10 秒上报一次。

JuiceFS Hadoop Java SDK 反对的所有配置参数请参考文档。

应用 Consul 作为注册核心

JuiceFS 反对应用 Consul 作为监控指标 API 的注册核心，默认的 Consul 地址是 127.0.0.1:8500，你也能够通过 --consul 选项自定义。如：

$ juicefs mount --consul 1.2.3.4:8500 ...

当配置了 Consul 地址当前，--metrics 选项不再须要配置，JuiceFS 将会依据本身网络与端口状况主动配置监控指标 URL。如果同时设置了 --metrics，则会优先尝试监听配置的 URL。

注册到 Consul 上的每个实例，其 serviceName 都为 juicefs，serviceId 的格局为 <IP>:<mount-point>，例如：127.0.0.1:/tmp/jfs。

每个 instance 的 meta 都蕴含了 hostname 与 mountpoint 两个维度，其中 mountpoint 为 s3gateway 代表该实例为 S3 网关。

可视化监控指标

Grafana 仪表盘模板

JuiceFS 提供一些 Grafana 的仪表盘模板，将模板导入当前就能够展现收集上来的监控指标。目前提供的仪表盘模板有：

模板名称	阐明
`grafana_template.json`	用于展现自挂载点、S3 网关（非 Kubernetes 部署）及 Hadoop Java SDK 收集的指标
`grafana_template_k8s.json`	用于展现自 Kubernetes CSI 驱动、S3 网关（Kubernetes 部署）收集的指标

Grafana 仪表盘示例成果如下图：

总结

应用 Grafana 做为宏观观测工具，当出现异常状况时能够首先察看其中是否存在异样指标，再进行进一步的剖析。同时重要指标倡议设置报警提醒，以便实时获取零碎状态异样的告诉。

如有帮忙的话欢送关注咱们我的项目 Juicedata/JuiceFS 哟！ (0ᴗ0✿)

关于文件系统:GrafanaPrometheus-搭建-JuiceFS-可视化监控系统

疾速上手

Ⅰ. 取得实时数据

Ⅱ. 增加 API 到 Prometheus Server

Ⅲ. 通过 Grafana 展示 Prometheus 的数据

收集监控指标

挂载点

Kubernetes

S3 网关

通过 Prometheus Operator 收集

Hadoop

Pushgateway

Graphite

应用 Consul 作为注册核心

可视化监控指标

Grafana 仪表盘模板

总结

评论

发表回复取消回复

更多文章

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

深入解析：基于Delta的线性数据结构模型，打造高效富文本编辑器

轻松管理社交媒体：使用Automa插件实现一键拉黑功能

关于文件系统:GrafanaPrometheus-搭建-JuiceFS-可视化监控系统

疾速上手

Ⅰ. 取得实时数据

Ⅱ. 增加 API 到 Prometheus Server

Ⅲ. 通过 Grafana 展示 Prometheus 的数据

收集监控指标

挂载点

Kubernetes

S3 网关

通过 Prometheus Operator 收集

Hadoop

Pushgateway

Graphite

应用 Consul 作为注册核心

可视化监控指标

Grafana 仪表盘模板

总结

评论

发表回复 取消回复

更多文章

苹果iOS打包的ipa应用无法安装？一篇文章带你了解可能的原因及排查方法

图解Golang：从零开始实现简易版过期LRU缓存

深入解析：基于Delta的线性数据结构模型，打造高效富文本编辑器

轻松管理社交媒体：使用Automa插件实现一键拉黑功能

发表回复取消回复