乐趣区

k8s部署单实例VictoriaMetrics完整教程

上一篇文章,我们讲述了如何利用 VictoriaMetrics 作为 Prometheus 的长期存储,来实现大规模 k8s 集群的监控。本文主要讲述部署单实例 VictoriaMetric 到 k8s 集群中。

部署 VictoriaMetrics

本文中,我们只会部署一个单实例的 VictoriaMetrics。我们会在以后的文章中部署集群版本。

完整的 yaml 如下:

---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: victoriametrics
      namespace: kube-system
      annotations:
        volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=victoriametrics"
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      labels:
        app: victoriametrics
      name: victoriametrics
      namespace: kube-system
    spec:
      serviceName: pvictoriametrics
      selector:
        matchLabels:
          app: victoriametrics
      replicas: 1
      template:
        metadata:
          labels:
            app: victoriametrics
        spec:
          containers:    
          - args:
            - --storageDataPath=/storage
            - --httpListenAddr=:8428
            - --retentionPeriod=1
            image: victoriametrics/victoria-metrics
            imagePullPolicy: IfNotPresent
            name: victoriametrics
            ports:
            - containerPort: 8428
              protocol: TCP
            readinessProbe:
              httpGet:
                path: /health
                port: 8428
              initialDelaySeconds: 30
              timeoutSeconds: 30
            livenessProbe:
              httpGet:
                path: /health
                port: 8428
              initialDelaySeconds: 120
              timeoutSeconds: 30
            resources:
              limits:
                cpu: 2000m
                memory: 2000Mi
              requests:
                cpu: 2000m
                memory: 2000Mi
            volumeMounts:
            - mountPath: /storage
              name: storage-volume
          restartPolicy: Always
          priorityClassName: system-cluster-critical
          volumes:
          - name: storage-volume
            persistentVolumeClaim:
              claimName: victoriametrics
---
  apiVersion: v1
  kind: Service
  metadata:
    labels:
      app: victoriametrics
    name: victoriametrics
    namespace: kube-system
  spec:
    ports:
    - name: http
      port: 8428
      protocol: TCP
      targetPort: 8428
    selector:
      app: victoriametrics
    type: ClusterIP

PS:

  • -storageDataPath– 数据目录的路径。VictoriaMetrics 将所有数据存储在此目录中。当前工作目录中的默认路径是 victoria-metrics-data
  • -retentionPeriod– 数据的保留期限(以月为单位)。旧数据将自动删除。默认期限为 1 个月。
  • -httpListenAddr– 用于侦听 http 请求的 TCP 地址。默认情况下,它在所有网络接口上监听端口 8428。

使用 kubectl 部署到 k8s 中:

kubectl apply -f victoria.yaml

然后查看其日志观察运行状况:

kubectl logs -f victoriametrics-0 -n kube-system
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:11    build version: victoria-metrics-20200528-173751-tags-v1.36.2-0-g0ec43cb8b
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:12    command line flags
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "bigMergeConcurrency" = "0"
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "csvTrimTimestamp" = "1ms"
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "dedup.minScrapeInterval" = "0s"
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "deleteAuthKey" = "secret"
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "enableTCP6" = "false"
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "envflag.enable" = "false"
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "envflag.prefix" = ""2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"fs.disableMmap"="false"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"graphiteListenAddr"=""
2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "graphiteTrimTimestamp" = "1s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.disableResponseCompression" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.maxGracefulShutdownDuration" = "7s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.pathPrefix" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"http.shutdownDelay"="0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"httpAuth.password"="secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"httpAuth.username"=""
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "httpListenAddr" = ":8428"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "import.maxLineLen" = "104857600"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "influxListenAddr" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"influxMeasurementFieldSeparator"="_"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"influxSkipSingleField"="false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"influxTrimTimestamp"="1ms"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"insert.maxQueueDuration"="1m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"loggerFormat"="default"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"loggerLevel"="INFO"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"loggerOutput"="stderr"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"maxConcurrentInserts"="16"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"maxInsertRequestSize"="33554432"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"maxLabelsPerTimeseries"="30"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"memory.allowedPercent"="60"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"metricsAuthKey"="secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"opentsdbHTTPListenAddr"=""
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "opentsdbListenAddr" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"opentsdbTrimTimestamp"="1s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"opentsdbhttp.maxInsertRequestSize"="33554432"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"opentsdbhttpTrimTimestamp"="1ms"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"pprofAuthKey"="secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"precisionBits"="64"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"promscrape.config"=""
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.config.dryRun" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.config.strictParse" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.configCheckInterval" = "0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.consulSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.disableCompression" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.discovery.concurrency" = "500"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.discovery.concurrentWaitTime" = "1m0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.dnsSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.ec2SDCheckInterval" = "1m0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.fileSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.gceSDCheckInterval" = "1m0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.kubernetesSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.maxScrapeSize" = "16777216"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.suppressScrapeErrors" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "retentionPeriod" = "1"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.cacheTimestampOffset" = "5m0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.disableCache" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.latencyOffset" = "30s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.logSlowQueryDuration" = "5s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxConcurrentRequests" = "8"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxExportDuration" = "720h0m0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxLookback" = "0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxPointsPerTimeseries" = "30000"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxQueryDuration" = "30s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxQueryLen" = "16384"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxQueueDuration" = "10s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxStalenessInterval" = "0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxTagKeys" = "secret"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxTagValues" = "100000"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxUniqueTimeseries" = "300000"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.minStalenessInterval" = "0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.resetCacheAuthKey" = "secret"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "selfScrapeInstance" = "self"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "selfScrapeInterval" = "0s"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "selfScrapeJob" = "victoria-metrics"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "smallMergeConcurrency" = "0"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "snapshotAuthKey" = "secret"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "storageDataPath" = "/storage"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "tls" = "false"
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "tlsCertFile" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"tlsKeyFile"="secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag"version"="false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/app/victoria-metrics/main.go:34    starting VictoriaMetrics at":8428"...
2020-05-31T09:54:12.939Z    info    VictoriaMetrics/app/vmstorage/main.go:50    opening storage at "/storage" with retention period 1 months
2020-05-31T09:54:12.944Z    info    VictoriaMetrics/lib/memory/memory.go:35    limiting caches to 1258291200 bytes, leaving 838860800 bytes to the OS according to -memory.allowedPercent=60
2020-05-31T09:54:12.945Z    info    VictoriaMetrics/lib/storage/storage.go:759    loading MetricName->TSID cache from "/storage/cache/metricName_tsid"...
2020-05-31T09:54:12.945Z    info    VictoriaMetrics/lib/storage/storage.go:764    loaded MetricName->TSID cache from "/storage/cache/metricName_tsid" in 0.001 seconds; entriesCount: 0; sizeBytes: 0
2020-05-31T09:54:12.945Z    info    VictoriaMetrics/lib/storage/storage.go:759    loading MetricID->TSID cache from "/storage/cache/metricID_tsid"...
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:764    loaded MetricID->TSID cache from "/storage/cache/metricID_tsid" in 0.001 seconds; entriesCount: 0; sizeBytes: 0
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:759    loading MetricID->MetricName cache from "/storage/cache/metricID_metricName"...
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:764    loaded MetricID->MetricName cache from "/storage/cache/metricID_metricName" in 0.001 seconds; entriesCount: 0; sizeBytes: 0
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:647    loading curr_hour_metric_ids from "/storage/cache/curr_hour_metric_ids"...
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:650    nothing to load from "/storage/cache/curr_hour_metric_ids"
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:647    loading prev_hour_metric_ids from "/storage/cache/prev_hour_metric_ids"...
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:650    nothing to load from "/storage/cache/prev_hour_metric_ids"
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:603    loading next_day_metric_ids from "/storage/cache/next_day_metric_ids"...
2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:606    nothing to load from "/storage/cache/next_day_metric_ids"
2020-05-31T09:54:12.948Z    info    VictoriaMetrics/lib/mergeset/table.go:167    opening table "/storage/indexdb/1614144487D535C6"...
2020-05-31T09:54:12.953Z    info    VictoriaMetrics/lib/mergeset/table.go:201    table "/storage/indexdb/1614144487D535C6" has been opened in 0.005 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2020-05-31T09:54:12.954Z    info    VictoriaMetrics/lib/mergeset/table.go:167    opening table "/storage/indexdb/1614144487D535C5"...
2020-05-31T09:54:12.959Z    info    VictoriaMetrics/lib/mergeset/table.go:201    table "/storage/indexdb/1614144487D535C5" has been opened in 0.005 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2020-05-31T09:54:12.969Z    info    VictoriaMetrics/app/vmstorage/main.go:66    successfully opened storage "/storage" in 0.030 seconds; partsCount: 0; blocksCount: 0; rowsCount: 0; sizeBytes: 0
2020-05-31T09:54:12.970Z    info    VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:57    loading rollupResult cache from "/storage/cache/rollupResult"...
2020-05-31T09:54:12.970Z    info    VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:83    loaded rollupResult cache from "/storage/cache/rollupResult" in 0.000 seconds; entriesCount: 0, sizeBytes: 0
2020-05-31T09:54:12.970Z    info    VictoriaMetrics/app/victoria-metrics/main.go:43    started VictoriaMetrics in 0.031 seconds
2020-05-31T09:54:12.970Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:76    starting http server at http://:8428/
2020-05-31T09:54:12.970Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:77    pprof handlers are exposed at http://:8428/debug/pprof/

部署 Prometheus

本文由于我们的 Prometheus 不涉及到 k8s 服务发现功能,所以并不涉及到 RBAC 授权。我们的集群使用的是 aws 的 eks,所以 Prometheus 的存储,我们使用的是 ebs。

完整的 yaml 如下:

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: kube-system
---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      labels:
        app: prometheus
      name: prometheus
      namespace: kube-system
    data:
      prometheus.yml: |-
        global:
            scrape_interval:     10s
            evaluation_interval: 10s
            external_labels:
                cluster: eks-01
        remote_write:
            - url: "http://victoriametrics:8428/api/v1/write"
              queue_config:
                max_samples_per_send: 10000
          
        scrape_configs:
            - job_name: 'prometheus'
              static_configs:
                - targets: ['prometheus:9090']
            - job_name: 'victoriametrics'
              static_configs:
                - targets: ['victoriametrics:8428']
---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      labels:
        app: rometheus
      name: prometheus
      namespace: kube-system
    spec:
      serviceName: prometheus
      selector:
        matchLabels:
          app: prometheus
      replicas: 1
      template:
        metadata:
          labels:
            app: prometheus
        spec:
          containers:
          - args:
            - --config.file=/etc/prometheus/prometheus.yml
            - --storage.tsdb.path=/data/prometheus
            - --storage.tsdb.retention=7d 
            image: prom/prometheus:v2.17.2
            imagePullPolicy: IfNotPresent
            name: prometheus
            ports:
            - containerPort: 9090
              protocol: TCP
            readinessProbe:
              httpGet:
                path: /-/ready
                port: 9090
              initialDelaySeconds: 30
              timeoutSeconds: 30
            livenessProbe:
              httpGet:
                path: /-/healthy
                port: 9090
              initialDelaySeconds: 30
              timeoutSeconds: 30
            resources:
              limits:
                cpu: 1000m
                memory: 2000Mi
              requests:
                cpu: 1000m
                memory: 2000Mi
            volumeMounts:
            - mountPath: /etc/prometheus
              name: config-volume
            - mountPath: /data
              name: storage-volume
          restartPolicy: Always
          serviceAccountName: prometheus
          initContainers:
          - name: "init-chown-data"
            image: "busybox:latest"
            imagePullPolicy: "IfNotPresent"
            command: ["chown", "-R", "65534:65534", "/data"]
            volumeMounts:
            - name: storage-volume
              mountPath: /data
              subPath: ""
          volumes:
          - configMap:
              defaultMode: 420
              name: prometheus
            name: config-volume
          - name: storage-volume
            persistentVolumeClaim:
              claimName: prometheus  
---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: prometheus
      namespace: kube-system
      annotations:
        volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=prometheus"
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 50Gi
---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: prometheus
      name: prometheus
      namespace: kube-system
    spec:
      ports:
      - name: http
        port: 9090
        protocol: TCP
        targetPort: 9090
      selector:
        app: prometheus
      type: ClusterIP

使用 kubectl 部署:

kubectl apply -f prometheus.yaml

然后查看具体日志,观察运行情况:

kubectl logs -f prometheus-0 -n kube-system
level=warn ts=2020-05-31T10:14:30.787Z caller=main.go:287 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."level=info ts=2020-05-31T10:14:30.787Z caller=main.go:333 msg="Starting Prometheus"version="(version=2.17.2, branch=HEAD, revision=18254838fbe25dcc732c950ae05f78ed4db1292c)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:334 build_context="(go=go1.13.10, user=root@9cb154c268a2, date=20200420-08:27:08)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:335 host_details="(Linux 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020 x86_64 prometheus-0 (none))"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:336 fd_limits="(soft=1048576, hard=1048576)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:337 vm_limits="(soft=unlimited, hard=unlimited)"level=info ts=2020-05-31T10:14:30.788Z caller=main.go:667 msg="Starting TSDB ..."level=info ts=2020-05-31T10:14:30.788Z caller=web.go:515 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-05-31T10:14:30.793Z caller=head.go:575 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-05-31T10:14:30.793Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-05-31T10:14:30.793Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=127.912µs
level=info ts=2020-05-31T10:14:30.794Z caller=main.go:683 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-05-31T10:14:30.794Z caller=main.go:684 msg="TSDB started"
level=info ts=2020-05-31T10:14:30.794Z caller=main.go:788 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2020-05-31T10:14:30.794Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="starting WAL watcher" queue=2f9134
ts=2020-05-31T10:14:30.795Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="replaying WAL" queue=2f9134
level=info ts=2020-05-31T10:14:30.795Z caller=main.go:816 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-05-31T10:14:30.795Z caller=main.go:635 msg="Server is ready to receive web requests."
ts=2020-05-31T10:14:38.218Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="done replaying WAL" duration=7.423760173s

部署 grafana

由于我们的 grafana 要从集群外访问到,所以我们 grafana 的 service 是 LoadBalancer 类型。

完整的 yaml 如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: kube-system
  labels:
    app: grafana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:6.7.2
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: grafana
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /api/health
            port: 3000
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 256Mi
        volumeMounts:
        - mountPath: /var/lib/grafana
          subPath: grafana
          name: storage
      securityContext:
        fsGroup: 472
        runAsUser: 472
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: grafana
---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: grafana
      namespace: kube-system
      annotations:
        volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=grafana"
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: nlb
        service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "project=grafana"
      labels:
        app: grafana
      name: grafana
      namespace: kube-system
    spec:
      ports:
      - name: http
        port: 3000
        protocol: TCP
        targetPort: 3000
      selector:
        app: grafana
      type: LoadBalancer

部署 grafana 到集群中

kubectl apply -f grafana.yaml

然后查看日志观察运行状况:

kubectl logs -f grafana-b68fcf96d-dj426 -n kube-system
]t=2020-05-31T10:48:41+0000 lvl=info msg="Starting Grafana" logger=server version=6.7.2 commit=423a25fc32 branch=HEAD compiled=2020-04-02T08:02:56+0000
t=2020-05-31T10:48:41+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2020-05-31T10:48:41+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.provisioning=/etc/grafana/provisioning"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.log.mode=console"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_DATA=/var/lib/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_LOGS=/var/log/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_USER=admin"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Provisioning" logger=settings path=/etc/grafana/provisioning
t=2020-05-31T10:48:41+0000 lvl=info msg="App mode production" logger=settings
t=2020-05-31T10:48:41+0000 lvl=info msg="Initializing SqlStore" logger=server
t=2020-05-31T10:48:41+0000 lvl=info msg="Connecting to DB" logger=sqlstore dbtype=sqlite3

t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update org_user table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Migrate all Read Only Viewers to Viewers"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index dashboard.account_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index dashboard_account_id_slug"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_tag table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index dashboard_tag.dasboard_id_term"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_dashboard_tag_dashboard_id_term - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table dashboard to dashboard_v1 - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_org_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_org_id_slug - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy dashboard v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop table dashboard_v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="alter dashboard.data to mediumtext v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column updated_by in dashboard - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column created_by in dashboard - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column gnetId in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for gnetId in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column plugin_id in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for plugin_id in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for dashboard_id in dashboard_tag"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard_tag table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column folder_id in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column isFolder in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column has_acl in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column uid in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update uid column values in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add unique index dashboard_org_id_uid"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Remove unique index org_id_slug"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard title length"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add unique index for dashboard_org_id_title_folder_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_provisioning"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table dashboard_provisioning to dashboard_provisioning_tmp_qwerty - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_provisioning v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_provisioning_dashboard_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_provisioning_dashboard_id_name - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy dashboard_provisioning v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop dashboard_provisioning_tmp_qwerty"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add check_sum column"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create data_source table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index data_source.account_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index data_source.account_id_name"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index IDX_data_source_account_id - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_data_source_account_id_name - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table data_source to data_source_v1 - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create data_source table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_data_source_org_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_data_source_org_id_name - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy data_source v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table data_source_v1 #2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column with_credentials"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add secure json data column"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update data_source table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update initial version to 1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add read_only data column"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Migrate logging ds to loki ds"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update json_data with nulls"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create api_key table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.account_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.key"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.account_id_name"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index IDX_api_key_account_id - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_api_key_key - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_api_key_account_id_name - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table api_key to api_key_v1 - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create api_key table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_api_key_org_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_api_key_key - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_api_key_org_id_name - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy api_key v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table api_key_v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update api_key table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add expires to api_key table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_snapshot table v4"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop table dashboard_snapshot_v4 #1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_snapshot table v5 #2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_snapshot_key - v5"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_snapshot_delete_key - v5"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_snapshot_user_id - v5"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="alter dashboard_snapshot to mediumtext v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard_snapshot table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column external_delete_url to dashboard_snapshots table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create quota table v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_quota_org_id_user_id_target - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update quota table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create plugin_setting table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_plugin_setting_org_id_plugin_id - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column plugin_version to plugin_settings"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update plugin_setting table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create session table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table playlist table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table playlist_item table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create playlist table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create playlist item table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update playlist table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update playlist_item table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop preferences table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop preferences table v3"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create preferences table v3"
t=2020-05-31T10:48:43+0000 lvl=info msg="Created default admin" logger=sqlstore user=admin
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing HTTPServer" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing BackendPluginManager" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing PluginManager" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Starting plugin search" logger=plugins
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing HooksService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing OSSLicensingService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing InternalMetricsService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing RemoteCache" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing RenderingService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing AlertEngine" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing QuotaService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing ServerLockService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing UserAuthTokenService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing DatasourceCacheService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing LoginService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing SearchService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing TracingService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing UsageStatsService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing CleanUpService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing NotificationService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing provisioningServiceImpl" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing Stream Manager"
t=2020-05-31T10:48:43+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=
t=2020-05-31T10:48:43+0000 lvl=info msg="Backend rendering via phantomJS" logger=rendering renderer=phantomJS
t=2020-05-31T10:48:43+0000 lvl=warn msg="phantomJS is deprecated and will be removed in a future release. You should consider migrating from phantomJS to grafana-image-renderer plugin. Read more at https://grafana.com/docs/grafana/latest/administration/image_rendering/" logger=rendering renderer=phantomJS

获取 lb 地址,然后访问 grafana:

接下来使用下面的 url 我们设置数据源 Prometheus:

http://victoriametrics:8428

增加 dashbord:

由于文章篇幅限制,具体的 dashbord 参考官网。

创建之后,我们就可以访问了。具体如下:

总结

本文分别介绍了在 k8s 中部署 prometheus,victoriametrics,grafana。后续我们会介绍部署集群版本的 victoriametrics。

退出移动版