上一篇文章,我们讲述了如何利用VictoriaMetrics作为Prometheus的长期存储,来实现大规模k8s集群的监控。本文主要讲述部署单实例VictoriaMetric到k8s集群中。

部署VictoriaMetrics

本文中,我们只会部署一个单实例的VictoriaMetrics。我们会在以后的文章中部署集群版本。

完整的yaml如下:

---    kind: PersistentVolumeClaim    apiVersion: v1    metadata:      name: victoriametrics      namespace: kube-system      annotations:        volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=victoriametrics"    spec:      accessModes:        - ReadWriteOnce      resources:        requests:          storage: 100Gi---    apiVersion: apps/v1    kind: StatefulSet    metadata:      labels:        app: victoriametrics      name: victoriametrics      namespace: kube-system    spec:      serviceName: pvictoriametrics      selector:        matchLabels:          app: victoriametrics      replicas: 1      template:        metadata:          labels:            app: victoriametrics        spec:          containers:              - args:            - --storageDataPath=/storage            - --httpListenAddr=:8428            - --retentionPeriod=1            image: victoriametrics/victoria-metrics            imagePullPolicy: IfNotPresent            name: victoriametrics            ports:            - containerPort: 8428              protocol: TCP            readinessProbe:              httpGet:                path: /health                port: 8428              initialDelaySeconds: 30              timeoutSeconds: 30            livenessProbe:              httpGet:                path: /health                port: 8428              initialDelaySeconds: 120              timeoutSeconds: 30            resources:              limits:                cpu: 2000m                memory: 2000Mi              requests:                cpu: 2000m                memory: 2000Mi            volumeMounts:            - mountPath: /storage              name: storage-volume          restartPolicy: Always          priorityClassName: system-cluster-critical          volumes:          - name: storage-volume            persistentVolumeClaim:              claimName: victoriametrics---  apiVersion: v1  kind: Service  metadata:    labels:      app: victoriametrics    name: victoriametrics    namespace: kube-system  spec:    ports:    - name: http      port: 8428      protocol: TCP      targetPort: 8428    selector:      app: victoriametrics    type: ClusterIP

PS:

  • -storageDataPath- 数据目录的路径。 VictoriaMetrics将所有数据存储在此目录中。当前工作目录中的默认路径是victoria-metrics-data
  • -retentionPeriod- 数据的保留期限(以月为单位)。旧数据将自动删除。默认期限为1个月。
  • -httpListenAddr- 用于侦听http请求的TCP地址。默认情况下,它在所有网络接口上监听端口8428。

使用kubectl 部署到k8s中:

kubectl apply -f victoria.yaml

然后查看其日志观察运行状况:

kubectl logs -f victoriametrics-0 -n kube-system2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:11    build version: victoria-metrics-20200528-173751-tags-v1.36.2-0-g0ec43cb8b2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:12    command line flags2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "bigMergeConcurrency" = "0"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "csvTrimTimestamp" = "1ms"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "dedup.minScrapeInterval" = "0s"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "deleteAuthKey" = "secret"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "enableTCP6" = "false"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "envflag.enable" = "false"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "envflag.prefix" = ""2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "fs.disableMmap" = "false"2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "graphiteListenAddr" = ""2020-05-31T09:54:12.938Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "graphiteTrimTimestamp" = "1s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.disableResponseCompression" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.maxGracefulShutdownDuration" = "7s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.pathPrefix" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "http.shutdownDelay" = "0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "httpAuth.password" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "httpAuth.username" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "httpListenAddr" = ":8428"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "import.maxLineLen" = "104857600"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "influxListenAddr" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "influxMeasurementFieldSeparator" = "_"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "influxSkipSingleField" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "influxTrimTimestamp" = "1ms"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "insert.maxQueueDuration" = "1m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "loggerFormat" = "default"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "loggerLevel" = "INFO"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "loggerOutput" = "stderr"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "maxConcurrentInserts" = "16"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "maxInsertRequestSize" = "33554432"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "maxLabelsPerTimeseries" = "30"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "memory.allowedPercent" = "60"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "metricsAuthKey" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "opentsdbHTTPListenAddr" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "opentsdbListenAddr" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "opentsdbTrimTimestamp" = "1s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "opentsdbhttp.maxInsertRequestSize" = "33554432"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "opentsdbhttpTrimTimestamp" = "1ms"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "pprofAuthKey" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "precisionBits" = "64"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.config" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.config.dryRun" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.config.strictParse" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.configCheckInterval" = "0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.consulSDCheckInterval" = "30s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.disableCompression" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.discovery.concurrency" = "500"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.discovery.concurrentWaitTime" = "1m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.dnsSDCheckInterval" = "30s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.ec2SDCheckInterval" = "1m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.fileSDCheckInterval" = "30s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.gceSDCheckInterval" = "1m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.kubernetesSDCheckInterval" = "30s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.maxScrapeSize" = "16777216"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "promscrape.suppressScrapeErrors" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "retentionPeriod" = "1"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.cacheTimestampOffset" = "5m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.disableCache" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.latencyOffset" = "30s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.logSlowQueryDuration" = "5s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxConcurrentRequests" = "8"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxExportDuration" = "720h0m0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxLookback" = "0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxPointsPerTimeseries" = "30000"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxQueryDuration" = "30s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxQueryLen" = "16384"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxQueueDuration" = "10s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxStalenessInterval" = "0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxTagKeys" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxTagValues" = "100000"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.maxUniqueTimeseries" = "300000"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.minStalenessInterval" = "0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "search.resetCacheAuthKey" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "selfScrapeInstance" = "self"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "selfScrapeInterval" = "0s"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "selfScrapeJob" = "victoria-metrics"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "smallMergeConcurrency" = "0"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "snapshotAuthKey" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "storageDataPath" = "/storage"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "tls" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "tlsCertFile" = ""2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "tlsKeyFile" = "secret"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/lib/logger/flag.go:20    flag "version" = "false"2020-05-31T09:54:12.939Z    info    VictoriaMetrics/app/victoria-metrics/main.go:34    starting VictoriaMetrics at ":8428"...2020-05-31T09:54:12.939Z    info    VictoriaMetrics/app/vmstorage/main.go:50    opening storage at "/storage" with retention period 1 months2020-05-31T09:54:12.944Z    info    VictoriaMetrics/lib/memory/memory.go:35    limiting caches to 1258291200 bytes, leaving 838860800 bytes to the OS according to -memory.allowedPercent=602020-05-31T09:54:12.945Z    info    VictoriaMetrics/lib/storage/storage.go:759    loading MetricName->TSID cache from "/storage/cache/metricName_tsid"...2020-05-31T09:54:12.945Z    info    VictoriaMetrics/lib/storage/storage.go:764    loaded MetricName->TSID cache from "/storage/cache/metricName_tsid" in 0.001 seconds; entriesCount: 0; sizeBytes: 02020-05-31T09:54:12.945Z    info    VictoriaMetrics/lib/storage/storage.go:759    loading MetricID->TSID cache from "/storage/cache/metricID_tsid"...2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:764    loaded MetricID->TSID cache from "/storage/cache/metricID_tsid" in 0.001 seconds; entriesCount: 0; sizeBytes: 02020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:759    loading MetricID->MetricName cache from "/storage/cache/metricID_metricName"...2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:764    loaded MetricID->MetricName cache from "/storage/cache/metricID_metricName" in 0.001 seconds; entriesCount: 0; sizeBytes: 02020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:647    loading curr_hour_metric_ids from "/storage/cache/curr_hour_metric_ids"...2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:650    nothing to load from "/storage/cache/curr_hour_metric_ids"2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:647    loading prev_hour_metric_ids from "/storage/cache/prev_hour_metric_ids"...2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:650    nothing to load from "/storage/cache/prev_hour_metric_ids"2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:603    loading next_day_metric_ids from "/storage/cache/next_day_metric_ids"...2020-05-31T09:54:12.946Z    info    VictoriaMetrics/lib/storage/storage.go:606    nothing to load from "/storage/cache/next_day_metric_ids"2020-05-31T09:54:12.948Z    info    VictoriaMetrics/lib/mergeset/table.go:167    opening table "/storage/indexdb/1614144487D535C6"...2020-05-31T09:54:12.953Z    info    VictoriaMetrics/lib/mergeset/table.go:201    table "/storage/indexdb/1614144487D535C6" has been opened in 0.005 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 02020-05-31T09:54:12.954Z    info    VictoriaMetrics/lib/mergeset/table.go:167    opening table "/storage/indexdb/1614144487D535C5"...2020-05-31T09:54:12.959Z    info    VictoriaMetrics/lib/mergeset/table.go:201    table "/storage/indexdb/1614144487D535C5" has been opened in 0.005 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 02020-05-31T09:54:12.969Z    info    VictoriaMetrics/app/vmstorage/main.go:66    successfully opened storage "/storage" in 0.030 seconds; partsCount: 0; blocksCount: 0; rowsCount: 0; sizeBytes: 02020-05-31T09:54:12.970Z    info    VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:57    loading rollupResult cache from "/storage/cache/rollupResult"...2020-05-31T09:54:12.970Z    info    VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:83    loaded rollupResult cache from "/storage/cache/rollupResult" in 0.000 seconds; entriesCount: 0, sizeBytes: 02020-05-31T09:54:12.970Z    info    VictoriaMetrics/app/victoria-metrics/main.go:43    started VictoriaMetrics in 0.031 seconds2020-05-31T09:54:12.970Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:76    starting http server at http://:8428/2020-05-31T09:54:12.970Z    info    VictoriaMetrics/lib/httpserver/httpserver.go:77    pprof handlers are exposed at http://:8428/debug/pprof/

部署Prometheus

本文由于我们的Prometheus不涉及到k8s服务发现功能,所以并不涉及到RBAC授权。我们的集群使用的是aws的eks,所以Prometheus的存储,我们使用的是ebs。

完整的yaml如下:

---apiVersion: v1kind: ServiceAccountmetadata:  name: prometheus  namespace: kube-system---    apiVersion: v1    kind: ConfigMap    metadata:      labels:        app: prometheus      name: prometheus      namespace: kube-system    data:      prometheus.yml: |-        global:            scrape_interval:     10s            evaluation_interval: 10s            external_labels:                cluster: eks-01        remote_write:            - url: "http://victoriametrics:8428/api/v1/write"              queue_config:                max_samples_per_send: 10000                  scrape_configs:            - job_name: 'prometheus'              static_configs:                - targets: ['prometheus:9090']            - job_name: 'victoriametrics'              static_configs:                - targets: ['victoriametrics:8428']---    apiVersion: apps/v1    kind: StatefulSet    metadata:      labels:        app: rometheus      name: prometheus      namespace: kube-system    spec:      serviceName: prometheus      selector:        matchLabels:          app: prometheus      replicas: 1      template:        metadata:          labels:            app: prometheus        spec:          containers:          - args:            - --config.file=/etc/prometheus/prometheus.yml            - --storage.tsdb.path=/data/prometheus            - --storage.tsdb.retention=7d             image: prom/prometheus:v2.17.2            imagePullPolicy: IfNotPresent            name: prometheus            ports:            - containerPort: 9090              protocol: TCP            readinessProbe:              httpGet:                path: /-/ready                port: 9090              initialDelaySeconds: 30              timeoutSeconds: 30            livenessProbe:              httpGet:                path: /-/healthy                port: 9090              initialDelaySeconds: 30              timeoutSeconds: 30            resources:              limits:                cpu: 1000m                memory: 2000Mi              requests:                cpu: 1000m                memory: 2000Mi            volumeMounts:            - mountPath: /etc/prometheus              name: config-volume            - mountPath: /data              name: storage-volume          restartPolicy: Always          serviceAccountName: prometheus          initContainers:          - name: "init-chown-data"            image: "busybox:latest"            imagePullPolicy: "IfNotPresent"            command: ["chown", "-R", "65534:65534", "/data"]            volumeMounts:            - name: storage-volume              mountPath: /data              subPath: ""          volumes:          - configMap:              defaultMode: 420              name: prometheus            name: config-volume          - name: storage-volume            persistentVolumeClaim:              claimName: prometheus  ---    kind: PersistentVolumeClaim    apiVersion: v1    metadata:      name: prometheus      namespace: kube-system      annotations:        volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=prometheus"    spec:      accessModes:        - ReadWriteOnce      resources:        requests:          storage: 50Gi---    apiVersion: v1    kind: Service    metadata:      labels:        app: prometheus      name: prometheus      namespace: kube-system    spec:      ports:      - name: http        port: 9090        protocol: TCP        targetPort: 9090      selector:        app: prometheus      type: ClusterIP

使用kubectl部署:

kubectl apply -f prometheus.yaml

然后查看具体日志,观察运行情况:

kubectl logs -f prometheus-0 -n kube-systemlevel=warn ts=2020-05-31T10:14:30.787Z caller=main.go:287 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."level=info ts=2020-05-31T10:14:30.787Z caller=main.go:333 msg="Starting Prometheus" version="(version=2.17.2, branch=HEAD, revision=18254838fbe25dcc732c950ae05f78ed4db1292c)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:334 build_context="(go=go1.13.10, user=root@9cb154c268a2, date=20200420-08:27:08)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:335 host_details="(Linux 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020 x86_64 prometheus-0 (none))"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:336 fd_limits="(soft=1048576, hard=1048576)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:337 vm_limits="(soft=unlimited, hard=unlimited)"level=info ts=2020-05-31T10:14:30.788Z caller=main.go:667 msg="Starting TSDB ..."level=info ts=2020-05-31T10:14:30.788Z caller=web.go:515 component=web msg="Start listening for connections" address=0.0.0.0:9090level=info ts=2020-05-31T10:14:30.793Z caller=head.go:575 component=tsdb msg="replaying WAL, this may take awhile"level=info ts=2020-05-31T10:14:30.793Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0level=info ts=2020-05-31T10:14:30.793Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=127.912µslevel=info ts=2020-05-31T10:14:30.794Z caller=main.go:683 fs_type=EXT4_SUPER_MAGIClevel=info ts=2020-05-31T10:14:30.794Z caller=main.go:684 msg="TSDB started"level=info ts=2020-05-31T10:14:30.794Z caller=main.go:788 msg="Loading configuration file" filename=/etc/prometheus/prometheus.ymlts=2020-05-31T10:14:30.794Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="starting WAL watcher" queue=2f9134ts=2020-05-31T10:14:30.795Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="replaying WAL" queue=2f9134level=info ts=2020-05-31T10:14:30.795Z caller=main.go:816 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.ymllevel=info ts=2020-05-31T10:14:30.795Z caller=main.go:635 msg="Server is ready to receive web requests."ts=2020-05-31T10:14:38.218Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="done replaying WAL" duration=7.423760173s

部署grafana

由于我们的grafana要从集群外访问到,所以我们grafana的service 是LoadBalancer类型。

完整的yaml如下:

apiVersion: apps/v1kind: Deploymentmetadata:  name: grafana  namespace: kube-system  labels:    app: grafanaspec:  replicas: 1  selector:    matchLabels:      app: grafana  template:    metadata:      labels:        app: grafana    spec:      containers:      - name: grafana        image: grafana/grafana:6.7.2        imagePullPolicy: IfNotPresent        ports:        - containerPort: 3000          name: grafana        env:        - name: GF_SECURITY_ADMIN_USER          value: admin        - name: GF_SECURITY_ADMIN_PASSWORD          value: admin        readinessProbe:          failureThreshold: 10          httpGet:            path: /api/health            port: 3000            scheme: HTTP          initialDelaySeconds: 60          periodSeconds: 10          successThreshold: 1          timeoutSeconds: 30        livenessProbe:          failureThreshold: 3          httpGet:            path: /api/health            port: 3000            scheme: HTTP          periodSeconds: 10          successThreshold: 1          timeoutSeconds: 1        resources:          limits:            cpu: 100m            memory: 256Mi          requests:            cpu: 100m            memory: 256Mi        volumeMounts:        - mountPath: /var/lib/grafana          subPath: grafana          name: storage      securityContext:        fsGroup: 472        runAsUser: 472      volumes:      - name: storage        persistentVolumeClaim:          claimName: grafana---    kind: PersistentVolumeClaim    apiVersion: v1    metadata:      name: grafana      namespace: kube-system      annotations:        volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=grafana"    spec:      accessModes:        - ReadWriteOnce      resources:        requests:          storage: 1Gi---    apiVersion: v1    kind: Service    metadata:      annotations:        service.beta.kubernetes.io/aws-load-balancer-type: nlb        service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "project=grafana"      labels:        app: grafana      name: grafana      namespace: kube-system    spec:      ports:      - name: http        port: 3000        protocol: TCP        targetPort: 3000      selector:        app: grafana      type: LoadBalancer

部署grafana到集群中

kubectl apply -f grafana.yaml

然后查看日志观察运行状况:

kubectl logs -f grafana-b68fcf96d-dj426 -n kube-system]t=2020-05-31T10:48:41+0000 lvl=info msg="Starting Grafana" logger=server version=6.7.2 commit=423a25fc32 branch=HEAD compiled=2020-04-02T08:02:56+0000t=2020-05-31T10:48:41+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.init=2020-05-31T10:48:41+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.init=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.provisioning=/etc/grafana/provisioning"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.log.mode=console"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_DATA=/var/lib/grafana"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_LOGS=/var/log/grafana"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_USER=admin"t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"t=2020-05-31T10:48:41+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafanat=2020-05-31T10:48:41+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafanat=2020-05-31T10:48:41+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafanat=2020-05-31T10:48:41+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/pluginst=2020-05-31T10:48:41+0000 lvl=info msg="Path Provisioning" logger=settings path=/etc/grafana/provisioningt=2020-05-31T10:48:41+0000 lvl=info msg="App mode production" logger=settingst=2020-05-31T10:48:41+0000 lvl=info msg="Initializing SqlStore" logger=servert=2020-05-31T10:48:41+0000 lvl=info msg="Connecting to DB" logger=sqlstore dbtype=sqlite3t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update org_user table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Migrate all Read Only Viewers to Viewers"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index dashboard.account_id"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index dashboard_account_id_slug"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_tag table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index dashboard_tag.dasboard_id_term"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_dashboard_tag_dashboard_id_term - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table dashboard to dashboard_v1 - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_org_id - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_org_id_slug - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy dashboard v1 to v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop table dashboard_v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="alter dashboard.data to mediumtext v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column updated_by in dashboard - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column created_by in dashboard - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column gnetId in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for gnetId in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column plugin_id in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for plugin_id in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for dashboard_id in dashboard_tag"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard_tag table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column folder_id in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column isFolder in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column has_acl in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column uid in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update uid column values in dashboard"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add unique index dashboard_org_id_uid"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Remove unique index org_id_slug"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard title length"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add unique index for dashboard_org_id_title_folder_id"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_provisioning"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table dashboard_provisioning to dashboard_provisioning_tmp_qwerty - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_provisioning v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_provisioning_dashboard_id - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_provisioning_dashboard_id_name - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy dashboard_provisioning v1 to v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop dashboard_provisioning_tmp_qwerty"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add check_sum column"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create data_source table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index data_source.account_id"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index data_source.account_id_name"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index IDX_data_source_account_id - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_data_source_account_id_name - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table data_source to data_source_v1 - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create data_source table v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_data_source_org_id - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_data_source_org_id_name - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy data_source v1 to v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table data_source_v1 #2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column with_credentials"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add secure json data column"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update data_source table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update initial version to 1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add read_only data column"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Migrate logging ds to loki ds"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update json_data with nulls"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create api_key table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.account_id"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.key"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.account_id_name"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index IDX_api_key_account_id - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_api_key_key - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_api_key_account_id_name - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table api_key to api_key_v1 - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create api_key table v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_api_key_org_id - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_api_key_key - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_api_key_org_id_name - v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy api_key v1 to v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table api_key_v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update api_key table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add expires to api_key table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_snapshot table v4"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop table dashboard_snapshot_v4 #1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_snapshot table v5 #2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_snapshot_key - v5"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_snapshot_delete_key - v5"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_snapshot_user_id - v5"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="alter dashboard_snapshot to mediumtext v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard_snapshot table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column external_delete_url to dashboard_snapshots table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create quota table v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_quota_org_id_user_id_target - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update quota table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create plugin_setting table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_plugin_setting_org_id_plugin_id - v1"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column plugin_version to plugin_settings"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update plugin_setting table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create session table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table playlist table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table playlist_item table"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create playlist table v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create playlist item table v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update playlist table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update playlist_item table charset"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop preferences table v2"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop preferences table v3"t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create preferences table v3"t=2020-05-31T10:48:43+0000 lvl=info msg="Created default admin" logger=sqlstore user=admint=2020-05-31T10:48:43+0000 lvl=info msg="Initializing HTTPServer" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing BackendPluginManager" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing PluginManager" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Starting plugin search" logger=pluginst=2020-05-31T10:48:43+0000 lvl=info msg="Initializing HooksService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing OSSLicensingService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing InternalMetricsService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing RemoteCache" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing RenderingService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing AlertEngine" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing QuotaService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing ServerLockService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing UserAuthTokenService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing DatasourceCacheService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing LoginService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing SearchService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing TracingService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing UsageStatsService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing CleanUpService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing NotificationService" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing provisioningServiceImpl" logger=servert=2020-05-31T10:48:43+0000 lvl=info msg="Initializing Stream Manager"t=2020-05-31T10:48:43+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=t=2020-05-31T10:48:43+0000 lvl=info msg="Backend rendering via phantomJS" logger=rendering renderer=phantomJSt=2020-05-31T10:48:43+0000 lvl=warn msg="phantomJS is deprecated and will be removed in a future release. You should consider migrating from phantomJS to grafana-image-renderer plugin. Read more at https://grafana.com/docs/grafana/latest/administration/image_rendering/" logger=rendering renderer=phantomJS

获取lb地址,然后访问grafana:

接下来使用下面的url我们设置数据源Prometheus:

http://victoriametrics:8428

增加dashbord:

由于文章篇幅限制,具体的dashbord参考官网。

创建之后,我们就可以访问了。具体如下:

总结

本文分别介绍了在k8s中部署prometheus,victoriametrics,grafana。后续我们会介绍部署集群版本的victoriametrics。