Helm install prometheus
kubectl create ns monitorhelm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo updatehelm search repo prometheus# prometheushelm show values prometheus-community/prometheus > prometheus.yaml -n monitor
uninstall
helm uninstall prometheus -n monitor
Helm install ingress
1. 增加ingress的helm仓库01.# helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx02.# helm search repo ingress-nginx#要应用APP VERSION大于0.4.2的版本2. 下载ingress的helm包至本地# helm pull ingress-nginx/ingress-nginx --version 3.6.03. 更改对应的配置tar xvf ingress-nginx-3.6.0.tgzcd ingress-nginxvim values.yaml4. 须要批改的地位a) Controller和admissionWebhook的镜像地址,须要将公网镜像同步至公司内网镜像仓库(和文档不统一的版本,须要自行同步gcr镜像的,能够百度查一下应用阿里云同步gcr的镜像,也能够参考这个链接https://blog.csdn.net/weixin_39961559/article/details/80739352,或者参考这个链接: https://blog.csdn.net/sinat_35543900/article/details/103290782)批改repository为地址registry.cn-beijing.aliyuncs.com/dotbalo/controller,并正文掉哈希值; ////Controller和admissionWebhook的镜像备选的地址image: registry: registry.aliyuncs.com #批改镜像仓库地址 image: google_containers/nginx-ingress-controller #批改镜像仓库和镜像名 //// b) 镜像的hash值正文;c) hostNetwork设置为true;d) dnsPolicy设置为 ClusterFirstWithHostNet;e) nodeSelector增加ingress: "true"部署至指定节点;f) 默认的类型是Deployment,更改为kind: DaemonSet;g) type: 默认是LoadBalancer(云环境应用这个) ,批改为ClusterIP;h) 倡议依据生产理论环境批改requests;i) 倡议依据生产理论环境批改admissionWebhooks;要应用APP VERSION大于0.4.2的版本,大于这个版本,这个enabled不须要批改j) image批改镜像地址为registry.cn-beijing.aliyuncs.com/dotbalo/kube-webhook-certgen //此项的备用地址参考a我的项目的备用地址// 5. 部署ingress给须要部署ingress的节点上打标签 01.//创立命名空间叫ingress-nginx# kubectl create ns ingress-nginx 02.//获取所有namespace;# kubectl get ns //查看到ingress-nginx创立实现;// 03.//取所有工作节点# kubectl get node 04.//比方咱们给部署在master03上ingress的节点上打标签# kubectl label node k8s-master03 ingress=truenode/k8s-master03 labeled 05.//留神开端的 . (点)# helm install ingress-nginx -n ingress-nginx . 06.//镜像拉取快慢取决于镜像地址,国内的阿里云比拟快(屡次刷新看到后果Ready 1/1,STATUS:Running为止)[root@k8s-master01 ingress-nginx]# kubectl get pod -n ingress-nginx 6. 将ingress controller部署至Node节点(ingress controller不能部署在master节点,须要装置视频中的步骤将ingress controller部署至Node节点,生产环境起码三个ingress controller,并且最好是独立的节点)kubectl label node k8s-node01 ingress=truekubectl label node k8s-master03 ingress-
ingress-nginx 配置应用
[root@vm2 ~]# cat ingress_alertmanager.yaml apiVersion: networking.k8s.io/v1kind: Ingress metadata: name: alertmanager-ingress namespace: monitorspec: ingressClassName: nginx rules: - host: "alertmanager.test.com" http: paths: - pathType: Prefix path: "/" backend: service: name: prometheus-alertmanager port: number: 9093[root@vm2 ~]# [root@vm2 ~]# cat ingress_prometheus.yaml apiVersion: networking.k8s.io/v1kind: Ingress metadata: name: prometheus-ingress namespace: monitorspec: ingressClassName: nginx rules: - host: "prometheus.test.com" http: paths: - pathType: Prefix path: "/" backend: service: name: prometheus-server port: number: 80
失效:kubectl apply -f ingress_alertmanager.yaml
prometheus 配置
[root@vm2 ~]# kubectl -n monitor get cmNAME DATA AGEprometheus-alertmanager 1 18hprometheus-server 6 18hkubectl -n monitor edit cm prometheus-server..- job_name: node-instance honor_timestamps: true scrape_interval: 1m scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - 192.168.1x.11:9100 - 192.168.1x.16:9100
验证
问题解决
pod ImagePullBackOff
kubectl describe pod prometheus-kube-state-metrics-xxxx -n monitorkubectl edit pod prometheus-kube-state-metrics-xxxx -n monitor
同样的,咱们通过docker仓库找一下雷同的,而后通过kubectl edit pod批改一下
k8s.gcr.io/kube-state-metrics/kube-state-metrics 替换为: docker.io/dyrnq/kube-state-metrics:v2.3.0
pod pending 问题
解决问题思路办法:
- 应用logs或者describe 查看定位问题
# kubectl describe pod prometheus-server-6d4664d595-pch8q -n monitor。。Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 117s (x619 over 15h) default-scheduler 0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
- 报错“6 pod has unbound immediate PersistentVolumeClaims”:没立刻绑定pvc
来查看 namespace下的pvc理论状况
[root@vm2 ~]# kubectl get pvc -n monitorNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEprometheus-server Pending 15hstorage-prometheus-alertmanager-0 Bound pv-volume-alertmanager1 2Gi RWO 15h
- pvc是pending状态未 bound pv,当初须要创立一个pv
- 查看须要创立相应pv的信息
应用: helm show values prometheus-community/prometheus这里咱们曾经把配置重定向到本地的prometheus.yaml 文件中,间接去文件中查看即可
创立 一个pv
[root@vm2 ~]# cat prometheus_pv.yaml kind: PersistentVolumeapiVersion: v1metadata: namespace: monitor name: pv-volume-prometheus labels: type: localspec: capacity: storage: 8Gi accessModes: - ReadWriteOnce hostPath: path: "/home/pv/prometheus/prometheus-server"
记得在节点上创立 目录
- 验证
CrashLoopBackOff 问题
1.describe 查看pod 问题
- 发现问题形容不清晰 只能去 node上 查看容器日志(目测在vm3上)
# 在vm3上docker ps -a #寻找exit的容器# 查看日志docker logs xxxxID
问题看起来是 权限导致的
msg="Error opening query log file" file=/data/queries.active err="open /data/queries.active: permission denied"
权限问题,监控套件基于kube-prometheus构建,prometheus的镜像中文件/prometheus/queries.active属主为1000这个用户,以后nfs门路prometheus-k8s-db-prometheus-k8s-0属主是root用户(有权限危险),从而导致写入失败。
批改PV的门路权限为777,确保后续pod中属主为1000的用户也能够对文件进行操作
4.验证
问题 Error: INSTALLATION FAILED: chart requires kubeVersion: >=1.20.0-0 which is incompatible with Kubernetes v1.19.8
装置ingress 碰到了 k8s版本过低的问题
helm pull ingress-nginx/ingress-nginx --version 3.6.0