上一篇文章,我们讲述了如何利用 VictoriaMetrics 作为 Prometheus 的长期存储,来实现大规模 k8s 集群的监控。本文主要讲述部署单实例 VictoriaMetric 到 k8s 集群中。
部署 VictoriaMetrics
本文中,我们只会部署一个单实例的 VictoriaMetrics。我们会在以后的文章中部署集群版本。
完整的 yaml 如下:
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: victoriametrics
namespace: kube-system
annotations:
volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=victoriametrics"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: victoriametrics
name: victoriametrics
namespace: kube-system
spec:
serviceName: pvictoriametrics
selector:
matchLabels:
app: victoriametrics
replicas: 1
template:
metadata:
labels:
app: victoriametrics
spec:
containers:
- args:
- --storageDataPath=/storage
- --httpListenAddr=:8428
- --retentionPeriod=1
image: victoriametrics/victoria-metrics
imagePullPolicy: IfNotPresent
name: victoriametrics
ports:
- containerPort: 8428
protocol: TCP
readinessProbe:
httpGet:
path: /health
port: 8428
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /health
port: 8428
initialDelaySeconds: 120
timeoutSeconds: 30
resources:
limits:
cpu: 2000m
memory: 2000Mi
requests:
cpu: 2000m
memory: 2000Mi
volumeMounts:
- mountPath: /storage
name: storage-volume
restartPolicy: Always
priorityClassName: system-cluster-critical
volumes:
- name: storage-volume
persistentVolumeClaim:
claimName: victoriametrics
---
apiVersion: v1
kind: Service
metadata:
labels:
app: victoriametrics
name: victoriametrics
namespace: kube-system
spec:
ports:
- name: http
port: 8428
protocol: TCP
targetPort: 8428
selector:
app: victoriametrics
type: ClusterIP
PS:
-
-storageDataPath
– 数据目录的路径。VictoriaMetrics 将所有数据存储在此目录中。当前工作目录中的默认路径是victoria-metrics-data
。 -
-retentionPeriod
– 数据的保留期限(以月为单位)。旧数据将自动删除。默认期限为 1 个月。 -
-httpListenAddr
– 用于侦听 http 请求的 TCP 地址。默认情况下,它在所有网络接口上监听端口 8428。
使用 kubectl 部署到 k8s 中:
kubectl apply -f victoria.yaml
然后查看其日志观察运行状况:
kubectl logs -f victoriametrics-0 -n kube-system
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:11 build version: victoria-metrics-20200528-173751-tags-v1.36.2-0-g0ec43cb8b
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:12 command line flags
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "bigMergeConcurrency" = "0"
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "csvTrimTimestamp" = "1ms"
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "dedup.minScrapeInterval" = "0s"
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "deleteAuthKey" = "secret"
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "enableTCP6" = "false"
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "envflag.enable" = "false"
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "envflag.prefix" = ""2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag"fs.disableMmap"="false"2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag"graphiteListenAddr"=""
2020-05-31T09:54:12.938Z info VictoriaMetrics/lib/logger/flag.go:20 flag "graphiteTrimTimestamp" = "1s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "http.disableResponseCompression" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "http.maxGracefulShutdownDuration" = "7s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "http.pathPrefix" = ""2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"http.shutdownDelay"="0s"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"httpAuth.password"="secret"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"httpAuth.username"=""
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "httpListenAddr" = ":8428"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "import.maxLineLen" = "104857600"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "influxListenAddr" = ""2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"influxMeasurementFieldSeparator"="_"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"influxSkipSingleField"="false"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"influxTrimTimestamp"="1ms"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"insert.maxQueueDuration"="1m0s"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"loggerFormat"="default"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"loggerLevel"="INFO"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"loggerOutput"="stderr"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"maxConcurrentInserts"="16"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"maxInsertRequestSize"="33554432"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"maxLabelsPerTimeseries"="30"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"memory.allowedPercent"="60"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"metricsAuthKey"="secret"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"opentsdbHTTPListenAddr"=""
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "opentsdbListenAddr" = ""2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"opentsdbTrimTimestamp"="1s"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"opentsdbhttp.maxInsertRequestSize"="33554432"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"opentsdbhttpTrimTimestamp"="1ms"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"pprofAuthKey"="secret"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"precisionBits"="64"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"promscrape.config"=""
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.config.dryRun" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.config.strictParse" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.configCheckInterval" = "0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.consulSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.disableCompression" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.discovery.concurrency" = "500"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.discovery.concurrentWaitTime" = "1m0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.dnsSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.ec2SDCheckInterval" = "1m0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.fileSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.gceSDCheckInterval" = "1m0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.kubernetesSDCheckInterval" = "30s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.maxScrapeSize" = "16777216"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "promscrape.suppressScrapeErrors" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "retentionPeriod" = "1"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.cacheTimestampOffset" = "5m0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.disableCache" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.latencyOffset" = "30s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.logSlowQueryDuration" = "5s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxConcurrentRequests" = "8"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxExportDuration" = "720h0m0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxLookback" = "0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxPointsPerTimeseries" = "30000"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxQueryDuration" = "30s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxQueryLen" = "16384"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxQueueDuration" = "10s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxStalenessInterval" = "0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxTagKeys" = "secret"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxTagValues" = "100000"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.maxUniqueTimeseries" = "300000"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.minStalenessInterval" = "0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "search.resetCacheAuthKey" = "secret"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "selfScrapeInstance" = "self"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "selfScrapeInterval" = "0s"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "selfScrapeJob" = "victoria-metrics"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "smallMergeConcurrency" = "0"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "snapshotAuthKey" = "secret"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "storageDataPath" = "/storage"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "tls" = "false"
2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag "tlsCertFile" = ""2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"tlsKeyFile"="secret"2020-05-31T09:54:12.939Z info VictoriaMetrics/lib/logger/flag.go:20 flag"version"="false"2020-05-31T09:54:12.939Z info VictoriaMetrics/app/victoria-metrics/main.go:34 starting VictoriaMetrics at":8428"...
2020-05-31T09:54:12.939Z info VictoriaMetrics/app/vmstorage/main.go:50 opening storage at "/storage" with retention period 1 months
2020-05-31T09:54:12.944Z info VictoriaMetrics/lib/memory/memory.go:35 limiting caches to 1258291200 bytes, leaving 838860800 bytes to the OS according to -memory.allowedPercent=60
2020-05-31T09:54:12.945Z info VictoriaMetrics/lib/storage/storage.go:759 loading MetricName->TSID cache from "/storage/cache/metricName_tsid"...
2020-05-31T09:54:12.945Z info VictoriaMetrics/lib/storage/storage.go:764 loaded MetricName->TSID cache from "/storage/cache/metricName_tsid" in 0.001 seconds; entriesCount: 0; sizeBytes: 0
2020-05-31T09:54:12.945Z info VictoriaMetrics/lib/storage/storage.go:759 loading MetricID->TSID cache from "/storage/cache/metricID_tsid"...
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:764 loaded MetricID->TSID cache from "/storage/cache/metricID_tsid" in 0.001 seconds; entriesCount: 0; sizeBytes: 0
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:759 loading MetricID->MetricName cache from "/storage/cache/metricID_metricName"...
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:764 loaded MetricID->MetricName cache from "/storage/cache/metricID_metricName" in 0.001 seconds; entriesCount: 0; sizeBytes: 0
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:647 loading curr_hour_metric_ids from "/storage/cache/curr_hour_metric_ids"...
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:650 nothing to load from "/storage/cache/curr_hour_metric_ids"
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:647 loading prev_hour_metric_ids from "/storage/cache/prev_hour_metric_ids"...
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:650 nothing to load from "/storage/cache/prev_hour_metric_ids"
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:603 loading next_day_metric_ids from "/storage/cache/next_day_metric_ids"...
2020-05-31T09:54:12.946Z info VictoriaMetrics/lib/storage/storage.go:606 nothing to load from "/storage/cache/next_day_metric_ids"
2020-05-31T09:54:12.948Z info VictoriaMetrics/lib/mergeset/table.go:167 opening table "/storage/indexdb/1614144487D535C6"...
2020-05-31T09:54:12.953Z info VictoriaMetrics/lib/mergeset/table.go:201 table "/storage/indexdb/1614144487D535C6" has been opened in 0.005 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2020-05-31T09:54:12.954Z info VictoriaMetrics/lib/mergeset/table.go:167 opening table "/storage/indexdb/1614144487D535C5"...
2020-05-31T09:54:12.959Z info VictoriaMetrics/lib/mergeset/table.go:201 table "/storage/indexdb/1614144487D535C5" has been opened in 0.005 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2020-05-31T09:54:12.969Z info VictoriaMetrics/app/vmstorage/main.go:66 successfully opened storage "/storage" in 0.030 seconds; partsCount: 0; blocksCount: 0; rowsCount: 0; sizeBytes: 0
2020-05-31T09:54:12.970Z info VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:57 loading rollupResult cache from "/storage/cache/rollupResult"...
2020-05-31T09:54:12.970Z info VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:83 loaded rollupResult cache from "/storage/cache/rollupResult" in 0.000 seconds; entriesCount: 0, sizeBytes: 0
2020-05-31T09:54:12.970Z info VictoriaMetrics/app/victoria-metrics/main.go:43 started VictoriaMetrics in 0.031 seconds
2020-05-31T09:54:12.970Z info VictoriaMetrics/lib/httpserver/httpserver.go:76 starting http server at http://:8428/
2020-05-31T09:54:12.970Z info VictoriaMetrics/lib/httpserver/httpserver.go:77 pprof handlers are exposed at http://:8428/debug/pprof/
部署 Prometheus
本文由于我们的 Prometheus 不涉及到 k8s 服务发现功能,所以并不涉及到 RBAC 授权。我们的集群使用的是 aws 的 eks,所以 Prometheus 的存储,我们使用的是 ebs。
完整的 yaml 如下:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
app: prometheus
name: prometheus
namespace: kube-system
data:
prometheus.yml: |-
global:
scrape_interval: 10s
evaluation_interval: 10s
external_labels:
cluster: eks-01
remote_write:
- url: "http://victoriametrics:8428/api/v1/write"
queue_config:
max_samples_per_send: 10000
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'victoriametrics'
static_configs:
- targets: ['victoriametrics:8428']
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: rometheus
name: prometheus
namespace: kube-system
spec:
serviceName: prometheus
selector:
matchLabels:
app: prometheus
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
containers:
- args:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/data/prometheus
- --storage.tsdb.retention=7d
image: prom/prometheus:v2.17.2
imagePullPolicy: IfNotPresent
name: prometheus
ports:
- containerPort: 9090
protocol: TCP
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
resources:
limits:
cpu: 1000m
memory: 2000Mi
requests:
cpu: 1000m
memory: 2000Mi
volumeMounts:
- mountPath: /etc/prometheus
name: config-volume
- mountPath: /data
name: storage-volume
restartPolicy: Always
serviceAccountName: prometheus
initContainers:
- name: "init-chown-data"
image: "busybox:latest"
imagePullPolicy: "IfNotPresent"
command: ["chown", "-R", "65534:65534", "/data"]
volumeMounts:
- name: storage-volume
mountPath: /data
subPath: ""
volumes:
- configMap:
defaultMode: 420
name: prometheus
name: config-volume
- name: storage-volume
persistentVolumeClaim:
claimName: prometheus
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: prometheus
namespace: kube-system
annotations:
volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=prometheus"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: prometheus
namespace: kube-system
spec:
ports:
- name: http
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
type: ClusterIP
使用 kubectl 部署:
kubectl apply -f prometheus.yaml
然后查看具体日志,观察运行情况:
kubectl logs -f prometheus-0 -n kube-system
level=warn ts=2020-05-31T10:14:30.787Z caller=main.go:287 deprecation_notice="'storage.tsdb.retention' flag is deprecated use 'storage.tsdb.retention.time' instead."level=info ts=2020-05-31T10:14:30.787Z caller=main.go:333 msg="Starting Prometheus"version="(version=2.17.2, branch=HEAD, revision=18254838fbe25dcc732c950ae05f78ed4db1292c)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:334 build_context="(go=go1.13.10, user=root@9cb154c268a2, date=20200420-08:27:08)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:335 host_details="(Linux 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020 x86_64 prometheus-0 (none))"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:336 fd_limits="(soft=1048576, hard=1048576)"level=info ts=2020-05-31T10:14:30.787Z caller=main.go:337 vm_limits="(soft=unlimited, hard=unlimited)"level=info ts=2020-05-31T10:14:30.788Z caller=main.go:667 msg="Starting TSDB ..."level=info ts=2020-05-31T10:14:30.788Z caller=web.go:515 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2020-05-31T10:14:30.793Z caller=head.go:575 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2020-05-31T10:14:30.793Z caller=head.go:624 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2020-05-31T10:14:30.793Z caller=head.go:627 component=tsdb msg="WAL replay completed" duration=127.912µs
level=info ts=2020-05-31T10:14:30.794Z caller=main.go:683 fs_type=EXT4_SUPER_MAGIC
level=info ts=2020-05-31T10:14:30.794Z caller=main.go:684 msg="TSDB started"
level=info ts=2020-05-31T10:14:30.794Z caller=main.go:788 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
ts=2020-05-31T10:14:30.794Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="starting WAL watcher" queue=2f9134
ts=2020-05-31T10:14:30.795Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="replaying WAL" queue=2f9134
level=info ts=2020-05-31T10:14:30.795Z caller=main.go:816 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2020-05-31T10:14:30.795Z caller=main.go:635 msg="Server is ready to receive web requests."
ts=2020-05-31T10:14:38.218Z caller=dedupe.go:112 component=remote level=info remote_name=2f9134 url=http://victoriametrics:8428/api/v1/write msg="done replaying WAL" duration=7.423760173s
部署 grafana
由于我们的 grafana 要从集群外访问到,所以我们 grafana 的 service 是 LoadBalancer 类型。
完整的 yaml 如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: kube-system
labels:
app: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:6.7.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: grafana
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
readinessProbe:
failureThreshold: 10
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
volumeMounts:
- mountPath: /var/lib/grafana
subPath: grafana
name: storage
securityContext:
fsGroup: 472
runAsUser: 472
volumes:
- name: storage
persistentVolumeClaim:
claimName: grafana
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: grafana
namespace: kube-system
annotations:
volume.beta.kubernetes.io/aws-block-storage-additional-resource-tags: "project=grafana"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: "project=grafana"
labels:
app: grafana
name: grafana
namespace: kube-system
spec:
ports:
- name: http
port: 3000
protocol: TCP
targetPort: 3000
selector:
app: grafana
type: LoadBalancer
部署 grafana 到集群中
kubectl apply -f grafana.yaml
然后查看日志观察运行状况:
kubectl logs -f grafana-b68fcf96d-dj426 -n kube-system
]t=2020-05-31T10:48:41+0000 lvl=info msg="Starting Grafana" logger=server version=6.7.2 commit=423a25fc32 branch=HEAD compiled=2020-04-02T08:02:56+0000
t=2020-05-31T10:48:41+0000 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini
t=2020-05-31T10:48:41+0000 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plugins"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.provisioning=/etc/grafana/provisioning"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from command line" logger=settings arg="default.log.mode=console"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_DATA=/var/lib/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_LOGS=/var/log/grafana"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PLUGINS=/var/lib/grafana/plugins"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_PATHS_PROVISIONING=/etc/grafana/provisioning"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_USER=admin"
t=2020-05-31T10:48:41+0000 lvl=info msg="Config overridden from Environment variable" logger=settings var="GF_SECURITY_ADMIN_PASSWORD=*********"
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
t=2020-05-31T10:48:41+0000 lvl=info msg="Path Provisioning" logger=settings path=/etc/grafana/provisioning
t=2020-05-31T10:48:41+0000 lvl=info msg="App mode production" logger=settings
t=2020-05-31T10:48:41+0000 lvl=info msg="Initializing SqlStore" logger=server
t=2020-05-31T10:48:41+0000 lvl=info msg="Connecting to DB" logger=sqlstore dbtype=sqlite3
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update org_user table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Migrate all Read Only Viewers to Viewers"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index dashboard.account_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index dashboard_account_id_slug"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_tag table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index dashboard_tag.dasboard_id_term"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_dashboard_tag_dashboard_id_term - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table dashboard to dashboard_v1 - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_org_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_org_id_slug - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy dashboard v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop table dashboard_v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="alter dashboard.data to mediumtext v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column updated_by in dashboard - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column created_by in dashboard - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column gnetId in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for gnetId in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column plugin_id in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for plugin_id in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add index for dashboard_id in dashboard_tag"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard_tag table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column folder_id in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column isFolder in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column has_acl in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column uid in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update uid column values in dashboard"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add unique index dashboard_org_id_uid"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Remove unique index org_id_slug"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard title length"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add unique index for dashboard_org_id_title_folder_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_provisioning"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table dashboard_provisioning to dashboard_provisioning_tmp_qwerty - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_provisioning v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_provisioning_dashboard_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_provisioning_dashboard_id_name - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy dashboard_provisioning v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop dashboard_provisioning_tmp_qwerty"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add check_sum column"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create data_source table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index data_source.account_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add unique index data_source.account_id_name"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index IDX_data_source_account_id - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_data_source_account_id_name - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table data_source to data_source_v1 - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create data_source table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_data_source_org_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_data_source_org_id_name - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy data_source v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table data_source_v1 #2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column with_credentials"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add secure json data column"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update data_source table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update initial version to 1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add read_only data column"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Migrate logging ds to loki ds"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update json_data with nulls"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create api_key table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.account_id"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.key"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="add index api_key.account_id_name"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index IDX_api_key_account_id - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_api_key_key - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop index UQE_api_key_account_id_name - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Rename table api_key to api_key_v1 - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create api_key table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_api_key_org_id - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_api_key_key - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_api_key_org_id_name - v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="copy api_key v1 to v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table api_key_v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update api_key table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add expires to api_key table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_snapshot table v4"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop table dashboard_snapshot_v4 #1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create dashboard_snapshot table v5 #2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_snapshot_key - v5"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_dashboard_snapshot_delete_key - v5"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index IDX_dashboard_snapshot_user_id - v5"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="alter dashboard_snapshot to mediumtext v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update dashboard_snapshot table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column external_delete_url to dashboard_snapshots table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create quota table v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_quota_org_id_user_id_target - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update quota table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create plugin_setting table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create index UQE_plugin_setting_org_id_plugin_id - v1"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Add column plugin_version to plugin_settings"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update plugin_setting table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create session table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table playlist table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Drop old table playlist_item table"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create playlist table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create playlist item table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update playlist table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="Update playlist_item table charset"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop preferences table v2"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="drop preferences table v3"
t=2020-05-31T10:48:42+0000 lvl=info msg="Executing migration" logger=migrator id="create preferences table v3"
t=2020-05-31T10:48:43+0000 lvl=info msg="Created default admin" logger=sqlstore user=admin
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing HTTPServer" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing BackendPluginManager" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing PluginManager" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Starting plugin search" logger=plugins
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing HooksService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing OSSLicensingService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing InternalMetricsService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing RemoteCache" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing RenderingService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing AlertEngine" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing QuotaService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing ServerLockService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing UserAuthTokenService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing DatasourceCacheService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing LoginService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing SearchService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing TracingService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing UsageStatsService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing CleanUpService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing NotificationService" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing provisioningServiceImpl" logger=server
t=2020-05-31T10:48:43+0000 lvl=info msg="Initializing Stream Manager"
t=2020-05-31T10:48:43+0000 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=
t=2020-05-31T10:48:43+0000 lvl=info msg="Backend rendering via phantomJS" logger=rendering renderer=phantomJS
t=2020-05-31T10:48:43+0000 lvl=warn msg="phantomJS is deprecated and will be removed in a future release. You should consider migrating from phantomJS to grafana-image-renderer plugin. Read more at https://grafana.com/docs/grafana/latest/administration/image_rendering/" logger=rendering renderer=phantomJS
获取 lb 地址,然后访问 grafana:
接下来使用下面的 url 我们设置数据源 Prometheus:
http://victoriametrics:8428
增加 dashbord:
由于文章篇幅限制,具体的 dashbord 参考官网。
创建之后,我们就可以访问了。具体如下:
总结
本文分别介绍了在 k8s 中部署 prometheus,victoriametrics,grafana。后续我们会介绍部署集群版本的 victoriametrics。