共计 11470 个字符,预计需要花费 29 分钟才能阅读完成。
prometheus-operator 提供了一个 Probe CRD 对象,能够用来进行黑盒监控,具体的探测性能由 Blackbox-exporter 实现。
blackbox-exporter 是 prometheus 社区提供的黑盒监控解决方案,反对用户通过 HTTP、HTTPS、TCP、ICMP 等形式对 target 进行网络探测。
一. 整体架构
具体应用时:
- 首先,用户创立一个 Probe CRD 对象,对象中指定探测形式、探测指标等参数;
- 而后,prometheus-operator watch 到 Probe 对象创立,而后生成对应的 prometheus 拉取配置,reload 到 prometheus 中;
- 最初,prometheus 应用 url=/probe?target={探测指标}&module={探测形式},拉取 blackbox-exporter,此时 blackbox-exporter 会对指标进行探测,并以 metrics 格局返回探测后果;
二. 部署 prometheus-operator
应用 kube-prometheus 部署 prometheus-operator。
# git clone -b release-0.8 git@github.com:prometheus-operator/kube-prometheus.git | |
# cd kube-prometheus |
首先,部署 CRD:
# kubectl apply -f manifests/setup | |
# kubectl get crd |grep coreos | |
alertmanagerconfigs.monitoring.coreos.com 2022-05-19T06:44:00Z | |
alertmanagers.monitoring.coreos.com 2022-05-19T06:44:01Z | |
podmonitors.monitoring.coreos.com 2022-05-19T06:44:01Z | |
probes.monitoring.coreos.com 2022-05-19T06:44:01Z | |
prometheuses.monitoring.coreos.com 2022-05-19T06:45:04Z | |
prometheusrules.monitoring.coreos.com 2022-05-19T06:44:01Z | |
servicemonitors.monitoring.coreos.com 2022-05-19T06:44:01Z | |
thanosrulers.monitoring.coreos.com 2022-05-19T06:44:02Z |
能够看到,部署了 probes.monitoring.coreos.com 这个 CRD.
而后,部署 prometheus-operator:
# kubectl apply -f manifests/ | |
# kubectl get pods -n monitoring | |
NAME READY STATUS RESTARTS AGE | |
alertmanager-main-0 2/2 Running 0 46m | |
alertmanager-main-1 2/2 Running 0 46m | |
alertmanager-main-2 2/2 Running 0 46m | |
blackbox-exporter-5cb5d7479d-mznws 3/3 Running 0 49m | |
grafana-d595885ff-cf49m 1/1 Running 0 49m | |
kube-state-metrics-685d769786-tkv7l 3/3 Running 0 22m | |
node-exporter-4d6mq 2/2 Running 0 49m | |
node-exporter-8cr4v 2/2 Running 0 49m | |
node-exporter-krr2h 2/2 Running 0 49m | |
prometheus-adapter-6fd94587c9-6tsgb 0/1 Running 0 3s | |
prometheus-adapter-6fd94587c9-8zm2l 1/1 Running 4 (13m ago) 13m | |
prometheus-k8s-0 2/2 Running 0 46m | |
prometheus-k8s-1 2/2 Running 0 46m | |
prometheus-operator-7684989c7-qt2sp 2/2 Running 0 49m |
部署实现后,给 service: prometheus-k8s 配置 NodePort,以便拜访 Prometheus UI。
三. Blackbox-exporter 的配置
Blackbox-exporter 运行时,须要传入一个配置文件。
配置文件中列出了 black-exporter 反对的探针,比方 icmp、tcp 等,其中:
- 每一种探测配置称为一个 module,以 yaml 格局提供;
-
每一个 module 蕴含:
- 探针类型:prober
- 超时工夫:timeout
- …
典型的 black-exporter 的配置文件:
apiVersion: v1 | |
data: | |
config.yml: |- | |
"modules": | |
"http_2xx": # module 名称 | |
"http": | |
"preferred_ip_protocol": "ip4" | |
"prober": "http" | |
"http_post_2xx": | |
"http": | |
"method": "POST" # POST 申请 | |
"preferred_ip_protocol": "ip4" | |
"prober": "http" | |
"tcp_connect": # tcp 连贯 | |
"prober": "tcp" | |
"timeout": "10s" | |
"tcp": | |
"preferred_ip_protocol": "ip4" | |
"dns": | |
"prober": "dns" | |
"dns": | |
"transport_protocol": "udp" | |
"preferred_ip_protocol": "ipv4" | |
"query_name": "kubernetes.default.svc.cluster.local" | |
"icmp": | |
"prober": "icmp" | |
kind: ConfigMap | |
metadata: | |
labels: | |
app.kubernetes.io/component: exporter | |
app.kubernetes.io/name: blackbox-exporter | |
app.kubernetes.io/part-of: kube-prometheus | |
app.kubernetes.io/version: 0.18.0 | |
name: blackbox-exporter-configuration | |
namespace: monitoring |
四. 创立 Probe 对象
1. Probe ping
创立一个 ping 工作:
apiVersion: monitoring.coreos.com/v1 | |
kind: Probe | |
metadata: | |
name: ping | |
namespace: monitoring | |
spec: | |
jobName: ping # 工作名称 | |
prober: # 指定 blackbox 的地址 | |
url: blackbox-exporter.monitoring:19115 | |
module: icmp # 配置文件中的检测模块 | |
targets: # 指标(能够是 static 配置也能够是 ingress 配置)# ingress <Object> | |
staticConfig: # 如果配置了 ingress,动态配置优先 | |
static: | |
- https://www.baidu.com |
期待一会后,能够在 prometheus 的页面上看到工作:
对应的,在 prometheus 生成的配置:
- job_name: probe/monitoring/ping | |
honor_timestamps: true | |
params: | |
module: | |
- icmp | |
scrape_interval: 30s | |
scrape_timeout: 10s | |
metrics_path: /probe | |
scheme: http | |
follow_redirects: true | |
relabel_configs: | |
- source_labels: [job] | |
separator: ; | |
regex: (.*) | |
target_label: __tmp_prometheus_job_name | |
replacement: $1 | |
action: replace | |
- separator: ; | |
regex: (.*) | |
target_label: job | |
replacement: ping | |
action: replace | |
- source_labels: [__address__] | |
separator: ; | |
regex: (.*) | |
target_label: __param_target | |
replacement: $1 | |
action: replace | |
- source_labels: [__param_target] | |
separator: ; | |
regex: (.*) | |
target_label: instance | |
replacement: $1 | |
action: replace | |
- separator: ; | |
regex: (.*) | |
target_label: __address__ | |
replacement: blackbox-exporter.monitoring:19115 | |
action: replace | |
static_configs: | |
- targets: | |
- https://www.baidu.com | |
labels: | |
namespace: monitoring |
2. Probe HTTP
创立一个 HTTP 工作:
apiVersion: monitoring.coreos.com/v1 | |
kind: Probe | |
metadata: | |
name: domain-probe | |
namespace: monitoring | |
spec: | |
jobName: domain-probe # 工作名称 | |
prober: # 指定 blackbox 的地址 | |
url: blackbox-exporter:19115 | |
module: http_2xx # 配置文件中的检测模块 | |
targets: # 指标(能够是 static 配置也能够是 ingress 配置)# ingress <Object> | |
staticConfig: # 如果配置了 ingress,动态配置优先 | |
static: | |
- prometheus.io |
期待一会后,能够在 prometheus 的页面上看到工作:
对应的,在 prometheus 生成的配置:
job_name: probe/monitoring/domain-probe | |
honor_timestamps: true | |
params: | |
module: | |
- http_2xx | |
scrape_interval: 30s | |
scrape_timeout: 10s | |
metrics_path: /probe | |
scheme: http | |
follow_redirects: true | |
relabel_configs: | |
- source_labels: [job] | |
separator: ; | |
regex: (.*) | |
target_label: __tmp_prometheus_job_name | |
replacement: $1 | |
action: replace | |
- separator: ; | |
regex: (.*) | |
target_label: job | |
replacement: domain-probe | |
action: replace | |
- source_labels: [__address__] | |
separator: ; | |
regex: (.*) | |
target_label: __param_target | |
replacement: $1 | |
action: replace | |
- source_labels: [__param_target] | |
separator: ; | |
regex: (.*) | |
target_label: instance | |
replacement: $1 | |
action: replace | |
- separator: ; | |
regex: (.*) | |
target_label: __address__ | |
replacement: blackbox-exporter:19115 | |
action: replace | |
static_configs: | |
- targets: | |
- prometheus.io | |
labels: | |
namespace: monitoring |
3. 查看拉取的指标
能够向 bloackbox-exporter 发送 curl 命令,传入探测形式和探测指标,blackbox-exporter 发动探测,并将探测后果以 metrics 的格局返回:
curl http://192.168.0.1:31392/probe?target=prometheus.io&module=http_2xx
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds | |
# TYPE probe_dns_lookup_time_seconds gauge | |
probe_dns_lookup_time_seconds 0.275433879 | |
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds | |
# TYPE probe_duration_seconds gauge | |
probe_duration_seconds 2.373368898 | |
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex | |
# TYPE probe_failed_due_to_regex gauge | |
probe_failed_due_to_regex 0 | |
# HELP probe_http_content_length Length of http content response | |
# TYPE probe_http_content_length gauge | |
probe_http_content_length -1 | |
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects | |
# TYPE probe_http_duration_seconds gauge | |
probe_http_duration_seconds{phase="connect"} 0.400100412 | |
probe_http_duration_seconds{phase="processing"} 0.509387522 | |
probe_http_duration_seconds{phase="resolve"} 0.365111732 | |
probe_http_duration_seconds{phase="tls"} 1.200170298 | |
probe_http_duration_seconds{phase="transfer"} 0.000451343 | |
# HELP probe_http_redirects The number of redirects | |
# TYPE probe_http_redirects gauge | |
probe_http_redirects 1 | |
# HELP probe_http_ssl Indicates if SSL was used for the final redirect | |
# TYPE probe_http_ssl gauge | |
probe_http_ssl 1 | |
# HELP probe_http_status_code Response HTTP status code | |
# TYPE probe_http_status_code gauge | |
probe_http_status_code 200 | |
# HELP probe_http_uncompressed_body_length Length of uncompressed response body | |
# TYPE probe_http_uncompressed_body_length gauge | |
probe_http_uncompressed_body_length 15757 | |
# HELP probe_http_version Returns the version of HTTP of the probe response | |
# TYPE probe_http_version gauge | |
probe_http_version 2 | |
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes. | |
# TYPE probe_ip_addr_hash gauge | |
probe_ip_addr_hash 2.590428662e+09 | |
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6 | |
# TYPE probe_ip_protocol gauge | |
probe_ip_protocol 4 | |
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime | |
# TYPE probe_ssl_earliest_cert_expiry gauge | |
probe_ssl_earliest_cert_expiry 1.686095999e+09 | |
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds | |
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge | |
probe_ssl_last_chain_expiry_timestamp_seconds 1.686095999e+09 | |
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information | |
# TYPE probe_ssl_last_chain_info gauge | |
probe_ssl_last_chain_info{fingerprint_sha256="99ac7e7bf8d38ce32c95b2b3c965a9d2b479b0bf2e3b40c576173131a249f877"} 1 | |
# HELP probe_success Displays whether or not the probe was a success | |
# TYPE probe_success gauge | |
probe_success 1 | |
# HELP probe_tls_version_info Contains the TLS version used | |
# TYPE probe_tls_version_info gauge | |
probe_tls_version_info{version="TLS 1.3"} 1 |
五. Probe 的源码剖析
Prometheus-Operator 对 Probe CRD 对象的解决,与其余 CRD 对象的处理过程相似:
- 首先,informer 监听 Probe CRD 对象的变动;
- 而后,依据新 CRD 生成新的 Prometheus 配置,并 reload 到 prometheus 上;
1. 监听 Probe CRD 对象
通过 Informer 监听 Probe CRD 对象的变动。
首先,创立 Informer:
// prometheus-operator/pkg/prometheus/operator.go | |
// New creates a new controller. | |
func New(ctx context.Context, conf operator.Config, logger log.Logger, r prometheus.Registerer) (*Operator, error) { | |
... | |
c := &Operator{...} | |
... | |
c.probeInfs, err = informers.NewInformersForResource( | |
informers.NewMonitoringInformerFactories( | |
c.config.Namespaces.AllowList, | |
c.config.Namespaces.DenyList, | |
mclient, | |
resyncPeriod, | |
nil, | |
), | |
monitoringv1.SchemeGroupVersion.WithResource(monitoringv1.ProbeName), | |
) | |
if err != nil {return nil, errors.Wrap(err, "error creating probe informers") | |
} | |
... | |
return c, nil | |
} |
而后,为 Informer 增加事件处理函数:
// prometheus-operator/pkg/prometheus/operator.go | |
// addHandlers adds the eventhandlers to the informers. | |
func (c *Operator) addHandlers() { | |
... | |
c.probeInfs.AddEventHandler(cache.ResourceEventHandlerFuncs{ | |
AddFunc: c.handleBmonAdd, | |
UpdateFunc: c.handleBmonUpdate, | |
DeleteFunc: c.handleBmonDelete, | |
}) | |
... | |
} |
看一下 Add 的事件处理函数:
- 将对象所在的 namespace 入队;
// TODO: Don't enqueue just for the namespace | |
func (c *Operator) handleBmonAdd(obj interface{}) {if o, ok := c.getObject(obj); ok {level.Debug(c.logger).Log("msg", "Probe added") | |
c.metrics.TriggerByCounter(monitoringv1.ProbesKind, "add").Inc() | |
c.enqueueForMonitorNamespace(o.GetNamespace()) | |
} | |
} |
2. 生成 Prometheus 配置
Prometheus-operator 中,有工作线程从 queue 中获取发生变化的对象,而后对其进行调谐。
// prometheus-operator/pkg/prometheus/operator.go | |
func (c *Operator) sync(ctx context.Context, key string) error { | |
... | |
// 在这里解决 Probe 对象 | |
if err := c.createOrUpdateConfigurationSecret(ctx, p, ruleConfigMapNames, assetStore); err != nil {return errors.Wrap(err, "creating config failed") | |
} | |
... | |
} |
对于 Probe 对象,依据其内容生成 Prometheus 配置,而后将其写入 secret;
也就是说,Prometheus 的配置被写入 Secret 对象,而后 reloader sidecar 将 Secret 的内容再 reload 到 Prometheus;
func (c *Operator) createOrUpdateConfigurationSecret(ctx context.Context, p *monitoringv1.Prometheus, ruleConfigMapNames []string, store *assets.Store) error { | |
... | |
// 获取 Probe 对象 | |
bmons, err := c.selectProbes(ctx, p, store) | |
if err != nil {return errors.Wrap(err, "selecting Probes failed") | |
} | |
... | |
// 生成新的配置 | |
conf, err := c.configGenerator.generateConfig( | |
p, | |
smons, | |
pmons, | |
bmons, | |
store.BasicAuthAssets, | |
store.BearerTokenAssets, | |
additionalScrapeConfigs, | |
additionalAlertRelabelConfigs, | |
additionalAlertManagerConfigs, | |
ruleConfigMapNames, | |
) | |
if err != nil {return errors.Wrap(err, "generating config failed") | |
} | |
// 将配置写入 Secret 对象 | |
s := makeConfigSecret(p, c.config) | |
... | |
} |
具体由 Probe 对象生成 Prometheus 配置的过程:
// pkg/prometheus/promcfg.go | |
func (cg *configGenerator) generateProbeConfig( | |
version semver.Version, | |
m *v1.Probe, | |
apiserverConfig *v1.APIServerConfig, | |
basicAuthSecrets map[string]assets.BasicAuthCredentials, | |
bearerTokens map[string]assets.BearerToken, | |
ignoreHonorLabels bool, | |
overrideHonorTimestamps bool, | |
ignoreNamespaceSelectors bool, | |
enforcedNamespaceLabel string) yaml.MapSlice {jobName := fmt.Sprintf("probe/%s/%s", m.Namespace, m.Name) | |
cfg := yaml.MapSlice{ | |
{ | |
Key: "job_name", | |
Value: jobName, | |
}, | |
} | |
... | |
// metrics_path 的配置 | |
path := "/probe" | |
if m.Spec.ProberSpec.Path != "" {path = m.Spec.ProberSpec.Path} | |
cfg = append(cfg, yaml.MapItem{Key: "metrics_path", Value: path}) | |
... | |
// params 的配置 | |
cfg = append(cfg, yaml.MapItem{Key: "params", Value: yaml.MapSlice{{Key: "module", Value: []string{m.Spec.Module}}, | |
}}) | |
... | |
// static_configs 的配置 | |
if m.Spec.Targets.StaticConfig != nil { | |
staticConfig := yaml.MapSlice{{Key: "targets", Value: m.Spec.Targets.StaticConfig.Targets}, | |
} | |
if m.Spec.Targets.StaticConfig.Labels != nil {if _, ok := m.Spec.Targets.StaticConfig.Labels["namespace"]; !ok {m.Spec.Targets.StaticConfig.Labels["namespace"] = m.Namespace | |
} | |
} else {m.Spec.Targets.StaticConfig.Labels = map[string]string{"namespace": m.Namespace} | |
} | |
staticConfig = append(staticConfig, yaml.MapSlice{{Key: "labels", Value: m.Spec.Targets.StaticConfig.Labels}, | |
}...) | |
cfg = append(cfg, yaml.MapItem{ | |
Key: "static_configs", | |
Value: []yaml.MapSlice{staticConfig}, | |
}) | |
... | |
} | |
... | |
return cfg | |
} |
参考:
1.https://docs.youdianzhishi.co…
2. 官网 doc: https://prometheus-operator.d…
3.probe 的 CRD: https://github.com/prometheus…