k8s中的监控原理、prometheus采集原理 能够看这个文章
k8s监控指标汇总,prometheus采集k8s原理解析
k8s-mon我的项目介绍
我的项目地址
https://github.com/n9e/k8s-mon
视频介绍
- https://www.bilibili.com/vide...
kube-stats-metrics 没数据
排查思路 dns问题
首先察看k8s-mon-deployment的日志
kubectl logs -l app=k8s-mon-deployment -n kube-admin |grep 8080# 如有下列报错阐明网络不通# err="Get \"http://kube-state-metrics.kube-system:8080/metrics\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"
排查dns,在node上申请coredns 服务
root@k8s-local-test-01:/etc/kubernetes/manifests$ kubectl get svc -n kube-system |grep dnskube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 73d
在node上申请 coredns 解析 kube-stats-metrics 域名
# 10.96.0.10 为申请到的coredns svc 地址 # 因为node上的搜寻域没有 svc.cluster.local,所以须要FQDNdig kube-state-metrics.kube-system.svc.cluster.local @10.96.0.10 # 如果失常的话则会有如下 A记录 root@k8s-local-test-01:~$ dig kube-state-metrics.kube-system.svc.cluster.local @10.96.0.10; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.el7_9.3 <<>> kube-state-metrics.kube-system.svc.cluster.local @10.96.0.10;; global options: +cmd;; Got answer:;; WARNING: .local is reserved for Multicast DNS;; You are currently testing what happens when an mDNS query is leaked to DNS;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12799;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1;; WARNING: recursion requested but not available;; OPT PSEUDOSECTION:; EDNS: version: 0, flags:; udp: 4096;; QUESTION SECTION:;kube-state-metrics.kube-system.svc.cluster.local. IN A;; ANSWER SECTION:kube-state-metrics.kube-system.svc.cluster.local. 25 IN A 10.100.30.129;; Query time: 0 msec;; SERVER: 10.96.0.10#53(10.96.0.10);; WHEN: Fri Apr 02 15:14:46 CST 2021;; MSG SIZE rcvd: 141
在node上申请kube-stats-metrics
root@k8s-local-test-01:/etc/kubernetes/manifests$ curl -s 10.100.30.129:8080/metrics |head # HELP kube_certificatesigningrequest_labels Kubernetes labels converted to Prometheus labels.# TYPE kube_certificatesigningrequest_labels gauge# HELP kube_certificatesigningrequest_created Unix creation timestamp# TYPE kube_certificatesigningrequest_created gauge# HELP kube_certificatesigningrequest_condition The number of each certificatesigningrequest condition# TYPE kube_certificatesigningrequest_condition gauge# HELP kube_certificatesigningrequest_cert_length Length of the issued cert# TYPE kube_certificatesigningrequest_cert_length gauge# HELP kube_configmap_info Information about configmap.# TYPE kube_configmap_info gauge
如果有输入证实 node上申请 coredns没问题,申请ksm服务也没问题
而后进入k8s-mon-deployment容器中查看
# 进入容器命令kubectl -n kube-admin exec "$(kubectl -nkube-admin get pod -l app=k8s-mon-deployment -o jsonpath='{.items[0].metadata.name}')" -ti -- /bin/sh # ping 一下 kube-state-metrics.kube-systemPING kube-state-metrics.kube-system (10.100.30.129): 56 data bytes64 bytes from 10.100.30.129: seq=0 ttl=64 time=0.097 ms64 bytes from 10.100.30.129: seq=1 ttl=64 time=0.093 ms64 bytes from 10.100.30.129: seq=2 ttl=64 time=0.114 ms64 bytes from 10.100.30.129: seq=3 ttl=64 time=0.124 ms# wget 申请一下ksm服务 / # wget http://kube-state-metrics.kube-system.svc.cluster.local:8080/metrics -O m |head m # HELP kube_certificatesigningrequest_labels Kubernetes labels converted to Prometheus labels.# TYPE kube_certificatesigningrequest_labels gauge# HELP kube_certificatesigningrequest_created Unix creation timestamp# TYPE kube_certificatesigningrequest_created gauge# HELP kube_certificatesigningrequest_condition The number of each certificatesigningrequest condition# TYPE kube_certificatesigningrequest_condition gauge# HELP kube_certificatesigningrequest_cert_length Length of the issued cert# TYPE kube_certificatesigningrequest_cert_length gauge# HELP kube_configmap_info Information about configmap.# TYPE kube_configmap_info gaugeConnecting to kube-state-metrics.kube-system.svc.cluster.local:8080 (10.100.30.129:8080)
如果在node上能够获取到,但在pod中获取不到思考 coredns有问题或者容器网络 有问题
打印coredns日志
oot@k8s-local-test-01:/etc/kubernetes/manifests$ kubectl logs -l k8s-app=kube-dns -n kube-system -f .:53[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7CoreDNS-1.7.0linux/amd64, go1.14.4, f59c03d.:53[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7CoreDNS-1.7.0linux/amd64, go1.14.4, f59c03d root@k8s-local-test-01:/etc/kubernetes/manifests$ kubectl logs -l k8s-app=kube-dns -n kube-system |grep -i error
容器网络问题 能够依照这个文档排查 https://juejin.cn/post/684490...
apiserver等服务组件没数据
排查思路 先看看日志 报什么错
在node上手动 带token 申请下apiserver 的metrics
TOKEN=$(kubectl -n kube-admin get secret $(kubectl -n kube-admin get serviceaccount k8s-mon -o jsonpath='{.secrets[0].name}') -o jsonpath='{.data.token}' | base64 --decode ) curl https://localhost:6443/metrics --header "Authorization: Bearer $TOKEN" --insecure # 如果失常的话能够看到metrics数据
服务组件没有部署在pod中的须要在configMap中给出地址 并设置 user_specified:true
kube_scheduler: user_specified: true addrs: - "https://1.1.1.1:1234/metrics" - "https://2.2.2.2:1234/metrics"
日志中报push到夜莺agent的谬误
level=error ts=2021-04-02T14:44:21.560+08:00 caller=push.go:79 msg=HttpPostPushDataBuildNewHttpPostReqError2 funcName=api-server url=http://localhost:2080/api/collector/push err="Post \"http://localhost:2080/api/collector/push\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
能够将 k8s-mon的日志改为 debug 查看下
command: - /opt/app/k8s-mon - --config.file=/etc/k8s-mon/k8s-mon.yml - --log.level=debug
debug日志会打印每个 阶段的耗时
level=debug ts=2021-04-02T16:19:20.723+08:00 caller=kube_state_metrics.go:244 msg=DoCollectSuccessfullyReadyToPush funcName=kube-stats-metrics metrics_num=3551 time_took_seconds=0.276154232 metric_addr=http://kube-state-metrics.kube-system:8080/metricslevel=debug ts=2021-04-02T16:19:20.733+08:00 caller=kube_controller_manager.go:180 msg=DoCollectSuccessfullyReadyToPush funcName=kube-controller-manager metrics_num=642 time_took_seconds=0.286183625level=debug ts=2021-04-02T16:19:20.845+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=kube-controller-manager url=http://localhost:2080/api/collector/push metricsNum=642 time_took_seconds=0.111731185level=debug ts=2021-04-02T16:19:20.935+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=kube-stats-metrics url=http://localhost:2080/api/collector/push metricsNum=3551 time_took_seconds=0.212283608level=debug ts=2021-04-02T16:19:21.459+08:00 caller=kube_apiserver.go:357 msg=DoCollectSuccessfullyReadyToPush funcName=api-server metrics_num=2168 time_took_seconds=1.012191635level=debug ts=2021-04-02T16:19:21.639+08:00 caller=push.go:25 msg=PushWorkSuccess funcName=api-server url=http://localhost:2080/api/collector/push metricsNum=2168 time_took_seconds=0.179650444
能够到node下面 手动推一条数据给夜莺的agent试试
curl -X POST -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'Content-Length: 183' -H 'Content-Type: application/json' -H 'User-Agent: python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-1160.11.1.el7.x86_64' -d '[{"tagsMap": {"k1": "v1"}, "step": 15, "endpoint": "1", "value": 1, "tags": "k1=v1", "timestamp": 1617346924, "metric": "abc_test", "extra": "", "nid": "1", "counterType": "COUNTER"}]' http://localhost:2080/api/collector/push
localhost 还是127.0.0.1问题?
- 在容器外部ping localhost看看