关于运维:Grafana-系列文章一基于-Grafana-的全栈可观察性-Demo

📚️Reference:

https://github.com/grafana/in…

这是对于 Grafana 中可察看性的三个支柱的一系列演讲的配套资源库。

它以一个自我关闭的 Docker 沙盒的模式呈现，包含在本地机器上运行和实验所提供的服务所需的所有组件。

Docker
Docker Compose

这个系列的演示是基于这个资源库中的应用程序和代码，其中包含：

Docker Compose 清单，便于设置。
三种服务的利用：
- 一个从 REST API 服务器申请数据的服务。
- 一个接管申请的 REST API 服务器，并利用数据库来存储 / 检索这些申请的数据。
- 一个用于存储 / 检索数据的 Postgres 数据库。
Tempo 实例用于存储 trace 信息。
Loki 实例，用于存储日志信息。
普罗米修斯（Prometheus）实例，用于存储度量 (Metrics) 信息。
Grafana 实例，用于可视化可察看性信息。
Grafana Agent 实例，用于接管 trace，并依据这些 trace 产生度量和日志。
一个 Node Exporter 实例，用于从本地主机检索资源度量。

Docker Compose 将下载所需的 Docker 镜像，而后启动演示环境。数据将从微服务利用中发射进去，并存储在 Loki、Tempo 和 Prometheus 中。你能够登录到 Grafana 实例，将这些数据可视化。要执行环境并登录。

在你的操作系统中启动一个新的命令行界面并运行：
```
docker-compose up
```
登录到本地的 Grafana 实例，网址是：http://localhost:3000/ 留神：这是假如 3000 端口还没有被应用。如果这个端口没有闲暇，请编辑 docker-compose.yml 文件，并批改这一行
```
- "3000:3000"
```
到其余一些闲暇的主机端口，例如：
```
- "3123:3000"
```
拜访 MLT dashboard. (MLT: Metrics/Logging/Tracing)
应用 Grafana Explorer 拜访数据源。

🐾 留神：

对于中国区用户，能够在须要 build 的局部加上 proxy, 如下：

   mythical-requester:
    build:
      context: ./source
      dockerfile: docker/Dockerfile
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890
        SERVICE: mythical-beasts-requester
 
  mythical-server:
    build:
      context: ./source
      dockerfile: docker/Dockerfile
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890              
        SERVICE: mythical-beasts-server
 
  prometheus:
    build: 
      context: ./prometheus
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890

Grafana 是一个可视化工具，容许从各种数据源创立仪表盘。更多信息能够在这里找到。

Grafana 实例在 docker-compose.yml 清单的 grafana 局部有形容。

   # The Grafana dashboarding server.
  grafana:
    image: grafana/grafana
    volumes:
      - "./grafana/definitions:/var/lib/grafana/dashboards"
      - "./grafana/provisioning:/etc/grafana/provisioning"
    ports:
      - "3000:3000"
    environment:
      - GF_FEATURE_TOGGLES_ENABLE=tempoSearch,tempoServiceGraph

它：

挂载两个资源库目录，为数据提供预置的数据源 (./grafana/provisioning/datasources.yaml)。
预置的仪表盘，用于关联指标、日志和跟踪。(./grafana/definitions/mlt.yaml)
为本地登录提供 3000 端口。
启用两个 Tempo 性能，即跨度搜寻 (span search) 和服务图反对 (service graph support)。

不应用自定义配置。

📚️ Reference:

格拉法纳代理 | 格拉法纳实验室 (grafana.com)

「它通常用作跟踪管道，从应用程序卸载（offloading）跟踪并将其转发到存储后端。Grafana Agent 跟踪堆栈是应用 OpenTelemetry 构建的。」

「Grafana Agent 反对以多种格局接管跟踪：OTLP（OpenTelemetry），Jaeger，Zipkin 和 OpenCensus。」

从跨度生成指标 | 格拉法纳实验室 (grafana.com)

普罗米修斯是一个后盾存储和服务，用于从各种起源刮取（拉取）指标数据。更多信息能够在这里找到。此外，Mimir 是 Prometheus 数据的长期保留存储，对于它的信息能够在这里找到。

Prometheus 实例在 docker-compose.yml 清单的 prometheus 局部有形容。

   prometheus:
    build: 
      context: ./prometheus
      args:
        HTTP_PROXY: http://192.168.2.9:7890
        HTTPS_PROXY: http://192.168.2.9:7890    
    ports:
      - "9090:9090"

它是由 prometheus 目录下的一个批改过的 Dockerfile 构建的。这将配置文件复制到新的镜像中，并通过批改启动时应用的命令字符串来启用一些性能（包含 Exemplar 反对 – "--enable-feature=exemplar-storage"）。普罗米修斯在 9090 端口裸露其次要接口。

 global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
 
remote_read:
scrape_configs:
  # Scrape Prometheus' own metrics.
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
        labels:
          group: 'prometheus'
 
  # Scrape from the Mythical Server service.
  - job_name: 'mythical-server'
    scrape_interval: 2s
    static_configs:
      - targets: ['mythical-server:4000']
        labels:
          group: 'mythical'
 
  # Scrape from the Mythical Requester service.
  - job_name: 'mythical-requester'
    scrape_interval: 2s
    static_configs:
      - targets: ['mythical-requester:4001']
        labels:
          group: 'mythical'
 
  # Scrape from the Node exporter, giving us resource usage.
  - job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['nodeexporter:9100']
        labels:
          group: 'resources'
 
  # Scrape from Grafana Agent, giving us metrics from traces it collects.
  - job_name: 'span-metrics'
    scrape_interval: 2s
    static_configs:
      - targets: ['agent:12348']
        labels:
          group: 'mythical'
 
  # Scrape from Grafana Agent, giving us metrics from traces it collects.
  - job_name: 'agent-metrics'
    scrape_interval: 2s
    static_configs:
      - targets: ['agent:12345']
        labels:
          group: 'mythical'

配置文件（prometheus/prometheus.yml）定义了几个 scrape 工作，包含。

从 Prometheus 实例自身检索指标。(job_name: 'prometheus')
从微服务利用中获取指标。(job_name: 'mythical-server' 和 job_name: 'mythical-requester')
来自已装置的 Node Exporter 实例的指标。(job_name: 'node')
来自 Grafana Agent 的指标，由传入的跟踪数据得出。(job_name: 'span-metrics')

📚️References:

Exemplars storage | Prometheus Docs

「OpenMetrics 引入了刮取指标的能力，能够将范例 (Exemplars) 增加到特定的度量中。范例是对度量集之外的数据的援用。一个常见的用例是程序跟踪的 id。」

Loki 是一个用于长期保留日志的后端存储。更多信息能够在这里找到。

Loki 实例在 docker-compose.yml 清单的 loki 局部有形容。

   loki:
    image: grafana/loki
    ports:
      - "3100:3100"

这个实例只是可用的 latest loki 镜像，并在3100 端口裸露其接口。

微服务应用程序通过其 REST API 将其日志间接发送到该环境中的 Loki 实例。

Tempo 是一个用于长期保留 trace 的后端存储。更多信息能够在这里找到。

Tempo 实例在 docker-compose.yml 清单的 tempo 局部有形容。

Tempo 服务导入了一个配置文件（tempo/tempo.yaml），该文件用一些正当的默认值初始化服务，并容许接管各种不同格局的跟踪。

   tempo:
    image: grafana/tempo:1.2.1
    ports:
      - "3200:3200"
      - "4317:4317"
      - "55680:55680"
      - "55681:55681"
      - "14250:14250"
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo/tempo.yaml:/etc/tempo.yaml

 server:
  http_listen_port: 3200
 
distributor:
  receivers:                           # 此配置将监听 tempo 可能监听的所有端口和协定。jaeger:                            # 更多的配置信息能够从 OpenTelemetry 收集器中取得
      protocols:                       # 在这里：https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
        thrift_http:                   #
        grpc:                          # 对于生产部署来说，你应该只启用你须要的接收器！thrift_binary:
        thrift_compact:
    otlp:
      protocols:
        http:
        grpc:
 
ingester:
  trace_idle_period: 10s               # 在一个追踪没有收到跨度后，认为它曾经实现并将其冲走的工夫长度。max_block_bytes: 1_000_000           # 当它达到这个尺寸时，切掉头块或。..
  max_block_duration: 5m               #   这么长时间
 
compactor:
  compaction:
    compaction_window: 1h              # 在这个工夫窗口中的块将被压缩在一起
    max_block_bytes: 100_000_000       # 压实块的最大尺寸
    block_retention: 1h
    compacted_block_retention: 10m
 
storage:
  trace:
    backend: local                     # 应用的后端配置
    block:
      bloom_filter_false_positive: .05 # 较低的值会产生较大的过滤器，但会产生较少的假阳性后果。index_downsample_bytes: 1000     # 每条索引记录的字节数
      encoding: zstd                   # 块编码 / 压缩。选项：none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
    wal:
      path: /tmp/tempo/wal             # 在本地存储 wal 的中央
      encoding: none                   # wal 编码 / 压缩。选项：none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
    local:
      path: /tmp/tempo/blocks
    pool:
      max_workers: 100                 # worker 池决定了对对象存储后盾的并行申请的数量
      queue_depth: 10000
 
search_enabled: true

Grafana Agent 是一个本地装置的代理，充当：

Prometheus 刮削服务。
Tempo 后端服务接收器 (backend service receiver) 和跟踪跨度处理器 (trace span processor)。
一个 Promtail（Loki 日志接收器）实例。

Grafana Agent 具备近程写入性能，容许它将指标、日志和跟踪数据发送到后端存储（如 Mimir、Loki 和 Tempo）。对于 Grafana Agent 的更多信息能够在这里找到。

它在这个环境中的次要作用是接管来自微服务利用的跟踪跨度 (trace span)，并解决它们以提取指标和日志信息，而后将它们存储到最终的后端存储。

它的配置文件能够在 agent/config.yaml 中找到。

   agent:
    image: grafana/agent:v0.24.0
    ports:
      - "12347:12345"
      - "12348:12348"
      - "6832:6832"
      - "55679:55679"
    volumes:
      - "${PWD}/agent/config.yaml:/etc/agent/agent.yaml"
    command: [
      "-config.file=/etc/agent/agent.yaml",
      "-server.http.address=0.0.0.0:12345",
    ]

 server:
  log_level: debug
 
# 配置一个日志摄取端点，用于自动记录性能。logs:
    configs:
    - name: loki
      clients:
        - url: http://loki:3100/loki/api/v1/push
          external_labels:
            job: agent
    positions_directory: /tmp/positions
 
# 配置一个 Tempo 实例来接管来自微服务的追踪。traces:
  configs:
  - name: latencyEndpoint
    # 在 6832 端口接管 Jaeger 格局的追踪信息。receivers:
      jaeger:
        protocols:
          thrift_binary:
            endpoint: "0.0.0.0:6832"
    # 向 Tempo 实例发送成批的跟踪数据。remote_write:
      - endpoint: tempo:55680
        insecure: true
    # 从传入的跟踪跨度生成普罗米修斯指标。spanmetrics:
      # 增加 http.target 和 http.method span 标签作为度量数据的标签。dimensions:
        - name: http.method
        - name: http.target
      # 在 12348 端口裸露这些指标。handler_endpoint: 0.0.0.0:12348
    # 从传入的跟踪数据中主动生成日志。automatic_logging:
      # 应用在配置文件开始时定义的日志实例。backend: logs_instance
      logs_instance_name: loki
      # 每个根跨度记录一行（即每个跟踪记录一行）。roots: true
      processes: false
      spans: false
      # 在日志行中增加 http.method、http.target 和 http.status_code span 标签。如果有的话。span_attributes:
        - http.method
        - http.target
        - http.status_code
      # 强制将跟踪 ID 设置为 `traceId`。overrides:
        trace_id_key: "traceId"
    # 启用服务图。service_graphs:
      enabled: true

英文	中文	备注
Exemplars	范例
Derived fields	衍生字段
Metrics	度量
Logging	日志
Tracing	跟踪
observability	可察看性
span search	跨度搜寻	Tempo 性能 – 须要 Grafana Agent
service graph	服务图反对	Tempo 性能 – 须要 Grafana Agent
scrape	刮削	Prometheus 词汇

Grafana 系列文章

三人行, 必有我师; 常识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

关于运维:Grafana-系列文章一基于-Grafana-的全栈可观察性-Demo

Grafana 全栈可察看性产品

具体的可察看性转换图

前提

概述

运行演示环境

Grafana

Prometheus

Loki

Tempo

Grafana Agent

词汇表

Grafana 系列文章

Just My Socks（注册教程内含优惠码）

	mythical-requester:
	build:
	context: ./source
	dockerfile: docker/Dockerfile
	args:
	HTTP_PROXY: http://192.168.2.9:7890
	HTTPS_PROXY: http://192.168.2.9:7890
	SERVICE: mythical-beasts-requester

	mythical-server:
	build:
	context: ./source
	dockerfile: docker/Dockerfile
	args:
	HTTP_PROXY: http://192.168.2.9:7890
	HTTPS_PROXY: http://192.168.2.9:7890
	SERVICE: mythical-beasts-server

	prometheus:
	build:
	context: ./prometheus
	args:
	HTTP_PROXY: http://192.168.2.9:7890
	HTTPS_PROXY: http://192.168.2.9:7890

	# The Grafana dashboarding server.
	grafana:
	image: grafana/grafana
	volumes:
	- "./grafana/definitions:/var/lib/grafana/dashboards"
	- "./grafana/provisioning:/etc/grafana/provisioning"
	ports:
	- "3000:3000"
	environment:
	- GF_FEATURE_TOGGLES_ENABLE=tempoSearch,tempoServiceGraph

	global:
	scrape_interval: 15s # By default, scrape targets every 15 seconds.

	remote_read:
	scrape_configs:
	# Scrape Prometheus' own metrics.
	- job_name: 'prometheus'
	static_configs:
	- targets: ['localhost:9090']
	labels:
	group: 'prometheus'

	# Scrape from the Mythical Server service.
	- job_name: 'mythical-server'
	scrape_interval: 2s
	static_configs:
	- targets: ['mythical-server:4000']
	labels:
	group: 'mythical'

	# Scrape from the Mythical Requester service.
	- job_name: 'mythical-requester'
	scrape_interval: 2s
	static_configs:
	- targets: ['mythical-requester:4001']
	labels:
	group: 'mythical'

	# Scrape from the Node exporter, giving us resource usage.
	- job_name: 'node'
	scrape_interval: 5s
	static_configs:
	- targets: ['nodeexporter:9100']
	labels:
	group: 'resources'

	# Scrape from Grafana Agent, giving us metrics from traces it collects.
	- job_name: 'span-metrics'
	scrape_interval: 2s
	static_configs:
	- targets: ['agent:12348']
	labels:
	group: 'mythical'

	# Scrape from Grafana Agent, giving us metrics from traces it collects.
	- job_name: 'agent-metrics'
	scrape_interval: 2s
	static_configs:
	- targets: ['agent:12345']
	labels:
	group: 'mythical'

	tempo:
	image: grafana/tempo:1.2.1
	ports:
	- "3200:3200"
	- "4317:4317"
	- "55680:55680"
	- "55681:55681"
	- "14250:14250"
	command: ["-config.file=/etc/tempo.yaml"]
	volumes:
	- ./tempo/tempo.yaml:/etc/tempo.yaml

	server:
	http_listen_port: 3200

	distributor:
	receivers: # 此配置将监听 tempo 可能监听的所有端口和协定。jaeger: # 更多的配置信息能够从 OpenTelemetry 收集器中取得
	protocols: # 在这里：https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
	thrift_http: #
	grpc: # 对于生产部署来说，你应该只启用你须要的接收器！thrift_binary:
	thrift_compact:
	otlp:
	protocols:
	http:
	grpc:

	ingester:
	trace_idle_period: 10s # 在一个追踪没有收到跨度后，认为它曾经实现并将其冲走的工夫长度。max_block_bytes: 1_000_000 # 当它达到这个尺寸时，切掉头块或。..
	max_block_duration: 5m # 这么长时间

	compactor:
	compaction:
	compaction_window: 1h # 在这个工夫窗口中的块将被压缩在一起
	max_block_bytes: 100_000_000 # 压实块的最大尺寸
	block_retention: 1h
	compacted_block_retention: 10m

	storage:
	trace:
	backend: local # 应用的后端配置
	block:
	bloom_filter_false_positive: .05 # 较低的值会产生较大的过滤器，但会产生较少的假阳性后果。index_downsample_bytes: 1000 # 每条索引记录的字节数
	encoding: zstd # 块编码 / 压缩。选项：none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
	wal:
	path: /tmp/tempo/wal # 在本地存储 wal 的中央
	encoding: none # wal 编码 / 压缩。选项：none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd
	local:
	path: /tmp/tempo/blocks
	pool:
	max_workers: 100 # worker 池决定了对对象存储后盾的并行申请的数量
	queue_depth: 10000

	search_enabled: true

	agent:
	image: grafana/agent:v0.24.0
	ports:
	- "12347:12345"
	- "12348:12348"
	- "6832:6832"
	- "55679:55679"
	volumes:
	- "${PWD}/agent/config.yaml:/etc/agent/agent.yaml"
	command: [
	"-config.file=/etc/agent/agent.yaml",
	"-server.http.address=0.0.0.0:12345",
	]

	server:
	log_level: debug

	# 配置一个日志摄取端点，用于自动记录性能。logs:
	configs:
	- name: loki
	clients:
	- url: http://loki:3100/loki/api/v1/push
	external_labels:
	job: agent
	positions_directory: /tmp/positions

	# 配置一个 Tempo 实例来接管来自微服务的追踪。traces:
	configs:
	- name: latencyEndpoint
	# 在 6832 端口接管 Jaeger 格局的追踪信息。receivers:
	jaeger:
	protocols:
	thrift_binary:
	endpoint: "0.0.0.0:6832"
	# 向 Tempo 实例发送成批的跟踪数据。remote_write:
	- endpoint: tempo:55680
	insecure: true
	# 从传入的跟踪跨度生成普罗米修斯指标。spanmetrics:
	# 增加 http.target 和 http.method span 标签作为度量数据的标签。dimensions:
	- name: http.method
	- name: http.target
	# 在 12348 端口裸露这些指标。handler_endpoint: 0.0.0.0:12348
	# 从传入的跟踪数据中主动生成日志。automatic_logging:
	# 应用在配置文件开始时定义的日志实例。backend: logs_instance
	logs_instance_name: loki
	# 每个根跨度记录一行（即每个跟踪记录一行）。roots: true
	processes: false
	spans: false
	# 在日志行中增加 http.method、http.target 和 http.status_code span 标签。如果有的话。span_attributes:
	- http.method
	- http.target
	- http.status_code
	# 强制将跟踪 ID 设置为 `traceId`。overrides:
	trace_id_key: "traceId"
	# 启用服务图。service_graphs:
	enabled: true

关于运维:Grafana-系列文章一基于-Grafana-的全栈可观察性-Demo

Grafana 全栈可察看性产品

具体的可察看性转换图

前提

概述

运行演示环境

Grafana

Prometheus

Loki

Tempo

Grafana Agent

词汇表

Grafana 系列文章

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）