prometheus-operator提供了一个Probe CRD对象,能够用来进行黑盒监控,具体的探测性能由Blackbox-exporter实现。

blackbox-exporter是prometheus社区提供的黑盒监控解决方案,反对用户通过HTTP、HTTPS、TCP、ICMP等形式对target进行网络探测。

一. 整体架构

具体应用时:

  • 首先,用户创立一个Probe CRD对象,对象中指定探测形式、探测指标等参数;
  • 而后,prometheus-operator watch到Probe对象创立,而后生成对应的prometheus拉取配置,reload到prometheus中;
  • 最初,prometheus应用url=/probe?target={探测指标}&module={探测形式},拉取blackbox-exporter,此时blackbox-exporter会对指标进行探测,并以metrics格局返回探测后果;

二. 部署prometheus-operator

应用kube-prometheus部署prometheus-operator。

# git clone -b release-0.8 git@github.com:prometheus-operator/kube-prometheus.git# cd kube-prometheus

首先,部署CRD:

# kubectl apply -f manifests/setup# kubectl get crd |grep coreosalertmanagerconfigs.monitoring.coreos.com            2022-05-19T06:44:00Zalertmanagers.monitoring.coreos.com                  2022-05-19T06:44:01Zpodmonitors.monitoring.coreos.com                    2022-05-19T06:44:01Zprobes.monitoring.coreos.com                         2022-05-19T06:44:01Zprometheuses.monitoring.coreos.com                   2022-05-19T06:45:04Zprometheusrules.monitoring.coreos.com                2022-05-19T06:44:01Zservicemonitors.monitoring.coreos.com                2022-05-19T06:44:01Zthanosrulers.monitoring.coreos.com                   2022-05-19T06:44:02Z

能够看到,部署了probes.monitoring.coreos.com这个CRD.

而后,部署prometheus-operator:

# kubectl apply -f manifests/# kubectl get pods -n monitoringNAME                                  READY   STATUS    RESTARTS      AGEalertmanager-main-0                   2/2     Running   0             46malertmanager-main-1                   2/2     Running   0             46malertmanager-main-2                   2/2     Running   0             46mblackbox-exporter-5cb5d7479d-mznws    3/3     Running   0             49mgrafana-d595885ff-cf49m               1/1     Running   0             49mkube-state-metrics-685d769786-tkv7l   3/3     Running   0             22mnode-exporter-4d6mq                   2/2     Running   0             49mnode-exporter-8cr4v                   2/2     Running   0             49mnode-exporter-krr2h                   2/2     Running   0             49mprometheus-adapter-6fd94587c9-6tsgb   0/1     Running   0             3sprometheus-adapter-6fd94587c9-8zm2l   1/1     Running   4 (13m ago)   13mprometheus-k8s-0                      2/2     Running   0             46mprometheus-k8s-1                      2/2     Running   0             46mprometheus-operator-7684989c7-qt2sp   2/2     Running   0             49m

部署实现后,给service: prometheus-k8s配置NodePort,以便拜访Prometheus UI。

三. Blackbox-exporter的配置

Blackbox-exporter运行时,须要传入一个配置文件。

配置文件中列出了black-exporter反对的探针,比方icmp、tcp等,其中:

  • 每一种探测配置称为一个module,以yaml格局提供;
  • 每一个module蕴含:

    • 探针类型:prober
    • 超时工夫:timeout
    • ...

典型的black-exporter的配置文件:

apiVersion: v1data:  config.yml: |-    "modules":      "http_2xx":                # module名称        "http":          "preferred_ip_protocol": "ip4"        "prober": "http"      "http_post_2xx":        "http":          "method": "POST"        # POST 申请          "preferred_ip_protocol": "ip4"        "prober": "http"      "tcp_connect":            # tcp连贯        "prober": "tcp"        "timeout": "10s"        "tcp":          "preferred_ip_protocol": "ip4"      "dns":        "prober": "dns"        "dns":          "transport_protocol": "udp"          "preferred_ip_protocol": "ipv4"          "query_name": "kubernetes.default.svc.cluster.local"      "icmp":        "prober": "icmp"kind: ConfigMapmetadata:  labels:    app.kubernetes.io/component: exporter    app.kubernetes.io/name: blackbox-exporter    app.kubernetes.io/part-of: kube-prometheus    app.kubernetes.io/version: 0.18.0  name: blackbox-exporter-configuration  namespace: monitoring

四. 创立Probe对象

1. Probe ping

创立一个ping工作:

apiVersion: monitoring.coreos.com/v1kind: Probemetadata:  name: ping  namespace: monitoringspec:  jobName: ping # 工作名称  prober: # 指定blackbox的地址    url: blackbox-exporter.monitoring:19115  module: icmp # 配置文件中的检测模块  targets: # 指标(能够是static配置也能够是ingress配置)    # ingress <Object>    staticConfig: # 如果配置了 ingress,动态配置优先      static:        - https://www.baidu.com

期待一会后,能够在prometheus的页面上看到工作:

对应的,在prometheus生成的配置:

- job_name: probe/monitoring/ping  honor_timestamps: true  params:    module:    - icmp  scrape_interval: 30s  scrape_timeout: 10s  metrics_path: /probe  scheme: http  follow_redirects: true  relabel_configs:  - source_labels: [job]    separator: ;    regex: (.*)    target_label: __tmp_prometheus_job_name    replacement: $1    action: replace  - separator: ;    regex: (.*)    target_label: job    replacement: ping    action: replace  - source_labels: [__address__]    separator: ;    regex: (.*)    target_label: __param_target    replacement: $1    action: replace  - source_labels: [__param_target]    separator: ;    regex: (.*)    target_label: instance    replacement: $1    action: replace  - separator: ;    regex: (.*)    target_label: __address__    replacement: blackbox-exporter.monitoring:19115    action: replace  static_configs:  - targets:    - https://www.baidu.com    labels:      namespace: monitoring

2. Probe HTTP

创立一个HTTP工作:

apiVersion: monitoring.coreos.com/v1kind: Probemetadata:  name: domain-probe  namespace: monitoringspec:  jobName: domain-probe # 工作名称  prober: # 指定blackbox的地址    url: blackbox-exporter:19115  module: http_2xx # 配置文件中的检测模块  targets: # 指标(能够是static配置也能够是ingress配置)    # ingress <Object>    staticConfig: # 如果配置了 ingress,动态配置优先      static:        - prometheus.io

期待一会后,能够在prometheus的页面上看到工作:

对应的,在prometheus生成的配置:

job_name: probe/monitoring/domain-probe  honor_timestamps: true  params:    module:    - http_2xx  scrape_interval: 30s  scrape_timeout: 10s  metrics_path: /probe  scheme: http  follow_redirects: true  relabel_configs:  - source_labels: [job]    separator: ;    regex: (.*)    target_label: __tmp_prometheus_job_name    replacement: $1    action: replace  - separator: ;    regex: (.*)    target_label: job    replacement: domain-probe    action: replace  - source_labels: [__address__]    separator: ;    regex: (.*)    target_label: __param_target    replacement: $1    action: replace  - source_labels: [__param_target]    separator: ;    regex: (.*)    target_label: instance    replacement: $1    action: replace  - separator: ;    regex: (.*)    target_label: __address__    replacement: blackbox-exporter:19115    action: replace  static_configs:  - targets:    - prometheus.io    labels:      namespace: monitoring

3. 查看拉取的指标

能够向bloackbox-exporter发送curl命令,传入探测形式和探测指标,blackbox-exporter发动探测,并将探测后果以metrics的格局返回:

curl http://192.168.0.1:31392/probe?target=prometheus.io&module=http_2xx
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds# TYPE probe_dns_lookup_time_seconds gaugeprobe_dns_lookup_time_seconds 0.275433879# HELP probe_duration_seconds Returns how long the probe took to complete in seconds# TYPE probe_duration_seconds gaugeprobe_duration_seconds 2.373368898# HELP probe_failed_due_to_regex Indicates if probe failed due to regex# TYPE probe_failed_due_to_regex gaugeprobe_failed_due_to_regex 0# HELP probe_http_content_length Length of http content response# TYPE probe_http_content_length gaugeprobe_http_content_length -1# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects# TYPE probe_http_duration_seconds gaugeprobe_http_duration_seconds{phase="connect"} 0.400100412probe_http_duration_seconds{phase="processing"} 0.509387522probe_http_duration_seconds{phase="resolve"} 0.365111732probe_http_duration_seconds{phase="tls"} 1.200170298probe_http_duration_seconds{phase="transfer"} 0.000451343# HELP probe_http_redirects The number of redirects# TYPE probe_http_redirects gaugeprobe_http_redirects 1# HELP probe_http_ssl Indicates if SSL was used for the final redirect# TYPE probe_http_ssl gaugeprobe_http_ssl 1# HELP probe_http_status_code Response HTTP status code# TYPE probe_http_status_code gaugeprobe_http_status_code 200# HELP probe_http_uncompressed_body_length Length of uncompressed response body# TYPE probe_http_uncompressed_body_length gaugeprobe_http_uncompressed_body_length 15757# HELP probe_http_version Returns the version of HTTP of the probe response# TYPE probe_http_version gaugeprobe_http_version 2# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.# TYPE probe_ip_addr_hash gaugeprobe_ip_addr_hash 2.590428662e+09# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6# TYPE probe_ip_protocol gaugeprobe_ip_protocol 4# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime# TYPE probe_ssl_earliest_cert_expiry gaugeprobe_ssl_earliest_cert_expiry 1.686095999e+09# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gaugeprobe_ssl_last_chain_expiry_timestamp_seconds 1.686095999e+09# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information# TYPE probe_ssl_last_chain_info gaugeprobe_ssl_last_chain_info{fingerprint_sha256="99ac7e7bf8d38ce32c95b2b3c965a9d2b479b0bf2e3b40c576173131a249f877"} 1# HELP probe_success Displays whether or not the probe was a success# TYPE probe_success gaugeprobe_success 1# HELP probe_tls_version_info Contains the TLS version used# TYPE probe_tls_version_info gaugeprobe_tls_version_info{version="TLS 1.3"} 1

五. Probe的源码剖析

Prometheus-Operator对Probe CRD对象的解决,与其余CRD对象的处理过程相似:

  • 首先,informer监听Probe CRD对象的变动;
  • 而后,依据新CRD生成新的Prometheus配置,并reload到prometheus上;

1. 监听Probe CRD对象

通过Informer监听Probe CRD对象的变动。

首先,创立Informer:

// prometheus-operator/pkg/prometheus/operator.go// New creates a new controller.func New(ctx context.Context, conf operator.Config, logger log.Logger, r prometheus.Registerer) (*Operator, error) {    ...    c := &Operator{        ...    }    ...    c.probeInfs, err = informers.NewInformersForResource(        informers.NewMonitoringInformerFactories(            c.config.Namespaces.AllowList,            c.config.Namespaces.DenyList,            mclient,            resyncPeriod,            nil,        ),        monitoringv1.SchemeGroupVersion.WithResource(monitoringv1.ProbeName),    )    if err != nil {        return nil, errors.Wrap(err, "error creating probe informers")    }    ...    return c, nil}

而后,为Informer增加事件处理函数:

// prometheus-operator/pkg/prometheus/operator.go// addHandlers adds the eventhandlers to the informers.func (c *Operator) addHandlers() {    ...    c.probeInfs.AddEventHandler(cache.ResourceEventHandlerFuncs{        AddFunc:    c.handleBmonAdd,        UpdateFunc: c.handleBmonUpdate,        DeleteFunc: c.handleBmonDelete,    })    ...}

看一下Add的事件处理函数:

  • 将对象所在的namespace入队;
// TODO: Don't enqueue just for the namespacefunc (c *Operator) handleBmonAdd(obj interface{}) {   if o, ok := c.getObject(obj); ok {      level.Debug(c.logger).Log("msg", "Probe added")      c.metrics.TriggerByCounter(monitoringv1.ProbesKind, "add").Inc()      c.enqueueForMonitorNamespace(o.GetNamespace())   }}

2. 生成Prometheus配置

Prometheus-operator中,有工作线程从queue中获取发生变化的对象,而后对其进行调谐。

// prometheus-operator/pkg/prometheus/operator.gofunc (c *Operator) sync(ctx context.Context, key string) error {    ...    // 在这里解决 Probe 对象    if err := c.createOrUpdateConfigurationSecret(ctx, p, ruleConfigMapNames, assetStore); err != nil {        return errors.Wrap(err, "creating config failed")    }    ...}

对于Probe对象,依据其内容生成Prometheus配置,而后将其写入secret;
也就是说,Prometheus的配置被写入Secret对象,而后reloader sidecar将Secret的内容再reload到Prometheus;

func (c *Operator) createOrUpdateConfigurationSecret(ctx context.Context, p *monitoringv1.Prometheus, ruleConfigMapNames []string, store *assets.Store) error {    ...    // 获取Probe对象    bmons, err := c.selectProbes(ctx, p, store)    if err != nil {        return errors.Wrap(err, "selecting Probes failed")    }    ...    // 生成新的配置    conf, err := c.configGenerator.generateConfig(        p,        smons,        pmons,        bmons,        store.BasicAuthAssets,        store.BearerTokenAssets,        additionalScrapeConfigs,        additionalAlertRelabelConfigs,        additionalAlertManagerConfigs,        ruleConfigMapNames,    )    if err != nil {        return errors.Wrap(err, "generating config failed")    }    // 将配置写入Secret对象    s := makeConfigSecret(p, c.config)    ...}

具体由Probe对象生成Prometheus配置的过程:

// pkg/prometheus/promcfg.gofunc (cg *configGenerator) generateProbeConfig(    version semver.Version,    m *v1.Probe,    apiserverConfig *v1.APIServerConfig,    basicAuthSecrets map[string]assets.BasicAuthCredentials,    bearerTokens map[string]assets.BearerToken,    ignoreHonorLabels bool,    overrideHonorTimestamps bool,    ignoreNamespaceSelectors bool,    enforcedNamespaceLabel string) yaml.MapSlice {    jobName := fmt.Sprintf("probe/%s/%s", m.Namespace, m.Name)    cfg := yaml.MapSlice{        {            Key:   "job_name",            Value: jobName,        },    }    ...    // metrics_path的配置    path := "/probe"    if m.Spec.ProberSpec.Path != "" {        path = m.Spec.ProberSpec.Path    }    cfg = append(cfg, yaml.MapItem{Key: "metrics_path", Value: path})    ...    // params的配置    cfg = append(cfg, yaml.MapItem{Key: "params", Value: yaml.MapSlice{        {Key: "module", Value: []string{m.Spec.Module}},    }})    ...    // static_configs的配置    if m.Spec.Targets.StaticConfig != nil {        staticConfig := yaml.MapSlice{            {Key: "targets", Value: m.Spec.Targets.StaticConfig.Targets},        }        if m.Spec.Targets.StaticConfig.Labels != nil {            if _, ok := m.Spec.Targets.StaticConfig.Labels["namespace"]; !ok {                m.Spec.Targets.StaticConfig.Labels["namespace"] = m.Namespace            }        } else {            m.Spec.Targets.StaticConfig.Labels = map[string]string{"namespace": m.Namespace}        }        staticConfig = append(staticConfig, yaml.MapSlice{            {Key: "labels", Value: m.Spec.Targets.StaticConfig.Labels},        }...)        cfg = append(cfg, yaml.MapItem{            Key:   "static_configs",            Value: []yaml.MapSlice{staticConfig},        })        ...    }        ...    return cfg}    

参考:

1.https://docs.youdianzhishi.co...
2.官网doc: https://prometheus-operator.d...
3.probe的CRD: https://github.com/prometheus...