乐趣区

关于容器:部署prometheusoperator

次要参考 KubeSpray 我的项目对 prometheus-operator 的部署流程,尝试手工部署 prometheus-operator。

kubeproary 部署 prometheus-opeartor 的流程

部署流程:

  • 部署 promethues-operator deploy;
  • 部署 prometheus 的其它组件, 如 node-exporter、kube-state-metrics;
# cat tasks/prometheus.yml
---
- name: Kubernetes Apps | Make sure {{prometheus_config_dir}} exists
  file:
    path: "{{prometheus_config_dir}}"
    state: directory

- name: Kubernetes Apps | Render templates for Prometheus-operator-deployment
  template:
    src: "{{item}}.yaml.j2"
    dest: "{{prometheus_config_dir}}/{{item}}.yaml"
  with_items:
    - prometheus-operator-deployment

- name: copy prometheus operators to {{kube_config_dir}}
  copy:
    src: "{{item}}.yaml"
    dest: "{{prometheus_config_dir}}/{{item}}.yaml"
  with_items:
    - 0namespace-namespace
    - prometheus-operator-0alertmanagerCustomResourceDefinition
    - prometheus-operator-0podmonitorCustomResourceDefinition
    - prometheus-operator-0prometheusCustomResourceDefinition
    - prometheus-operator-0prometheusruleCustomResourceDefinition
    - prometheus-operator-0servicemonitorCustomResourceDefinition
    - prometheus-operator-0thanosrulerCustomResourceDefinition
    - prometheus-operator-clusterRoleBinding
    - prometheus-operator-clusterRole
    - prometheus-operator-serviceAccount
    - prometheus-operator-service
    - prometheus-rules

- name: Kubernetes Apps | apply prometheus-operator
  kube:
    kubectl: "{{bin_dir}}/kubectl"
    filename: "{{prometheus_config_dir}}/{{item}}.yaml"
    state: "latest"
  register: result
  until: result is succeeded
  retries: 10
  delay: 6
  with_items: "{{prometheus_operators}}"

- name: Kubernetes Apps | Render templates for Prometheus
  template:
    src: "{{item}}.yaml.j2"
    dest: "{{prometheus_config_dir}}/{{item}}.yaml"
  register: prometheus_reg
  with_items:
    - alertmanager-alertmanager
    - alertmanager-secret
    - alertmanager-serviceAccount
    - alertmanager-serviceMonitor
    - alertmanager-service
    - kube-state-metrics-clusterRoleBinding
    - kube-state-metrics-clusterRole
    - kube-state-metrics-deployment
    - kube-state-metrics-serviceAccount
    - kube-state-metrics-serviceMonitor
    - kube-state-metrics-service
    - node-exporter-clusterRoleBinding
    - node-exporter-clusterRole
    - node-exporter-daemonset
    - node-exporter-serviceAccount
    - node-exporter-serviceMonitor
    - node-exporter-service
    - prometheus-adapter-apiService
    - prometheus-adapter-clusterRoleAggregatedMetricsReader
    - prometheus-adapter-clusterRoleBindingDelegator
    - prometheus-adapter-clusterRoleBinding
    - prometheus-adapter-clusterRoleServerResources
    - prometheus-adapter-clusterRole
    - prometheus-adapter-configMap
    - prometheus-adapter-deployment
    - prometheus-adapter-roleBindingAuthReader
    - prometheus-adapter-serviceAccount
    - prometheus-adapter-serviceMonitor
    - prometheus-adapter-service
    - prometheus-clusterRoleBinding
    - prometheus-clusterRole
    - prometheus-kubeControllerManagerPrometheusDiscoveryService
    - prometheus-kubeSchedulerPrometheusDiscoveryService
    - prometheus-operator-serviceMonitor
    - prometheus-prometheus
    - prometheus-roleBindingConfig
    - prometheus-roleBindingSpecificNamespaces
    - prometheus-roleConfig
    - prometheus-roleSpecificNamespaces
    - prometheus-serviceAccount
    - prometheus-serviceMonitorApiserver
    - prometheus-serviceMonitorCoreDNS
    - prometheus-serviceMonitorKubeControllerManager
    - prometheus-serviceMonitorKubelet
    - prometheus-serviceMonitorKubeScheduler
    - prometheus-serviceMonitor
    - prometheus-service

- name: Kubernetes Apps | Add policies, roles, bindings for Prometheus
  kube:
    kubectl: "{{bin_dir}}/kubectl"
    filename: "{{prometheus_config_dir}}/{{item.item}}.yaml"
    state: "latest"
  register: result
  until: result is succeeded
  retries: 10
  delay: 6
  with_items: "{{prometheus_reg.results}}"

手工部署 prometheus-operator

  1. 提前给 master-node 打 tag

因为 prometheus 抉择部署在 master 节点上

kubectl label nodes k8s-master node-role.kubernetes.io/master=
  1. 部署 prometheus-operator deploy
kubectl create -f .
// 文件列表
[root@k8s-master prometheus]# tree ./operator/
./operator/
├── 0namespace-namespace.yaml
├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml
├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml
├── prometheus-operator-0prometheusCustomResourceDefinition.yaml
├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml
├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml
├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml
├── prometheus-operator-clusterRoleBinding.yaml
├── prometheus-operator-clusterRole.yaml
├── prometheus-operator-deployment.yaml
├── prometheus-operator-serviceAccount.yaml
├── prometheus-operator-service.yaml
└── prometheus-rules.yaml

0 directories, 13 files
  1. 部署 prometheus 其它组件
kubectl create -f .
// 文件列表
[root@k8s-master prometheus]# tree ./prometheus/
./prometheus/
├── alertmanager-alertmanager.yaml
├── alertmanager-secret.yaml
├── alertmanager-serviceAccount.yaml
├── alertmanager-serviceMonitor.yaml
├── alertmanager-service.yaml
├── kube-state-metrics-clusterRoleBinding.yaml
├── kube-state-metrics-clusterRole.yaml
├── kube-state-metrics-deployment.yaml
├── kube-state-metrics-serviceAccount.yaml
├── kube-state-metrics-serviceMonitor.yaml
├── kube-state-metrics-service.yaml
├── node-exporter-clusterRoleBinding.yaml
├── node-exporter-clusterRole.yaml
├── node-exporter-daemonset.yaml
├── node-exporter-serviceAccount.yaml
├── node-exporter-serviceMonitor.yaml
├── node-exporter-service.yaml
├── prometheus-adapter-apiService.yaml
├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
├── prometheus-adapter-clusterRoleBindingDelegator.yaml
├── prometheus-adapter-clusterRoleBinding.yaml
├── prometheus-adapter-clusterRoleServerResources.yaml
├── prometheus-adapter-clusterRole.yaml
├── prometheus-adapter-configMap.yaml
├── prometheus-adapter-deployment.yaml
├── prometheus-adapter-roleBindingAuthReader.yaml
├── prometheus-adapter-serviceAccount.yaml
├── prometheus-adapter-serviceMonitor.yaml
├── prometheus-adapter-service.yaml
├── prometheus-clusterRoleBinding.yaml
├── prometheus-clusterRole.yaml
├── prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml
├── prometheus-kubeSchedulerPrometheusDiscoveryService.yaml
├── prometheus-operator-serviceMonitor.yaml
├── prometheus-prometheus.yaml
├── prometheus-roleBindingConfig.yaml
├── prometheus-roleBindingSpecificNamespaces.yaml
├── prometheus-roleConfig.yaml
├── prometheus-roleSpecificNamespaces.yaml
├── prometheus-serviceAccount.yaml
├── prometheus-serviceMonitorApiserver.yaml
├── prometheus-serviceMonitorCoreDNS.yaml
├── prometheus-serviceMonitorKubeControllerManager.yaml
├── prometheus-serviceMonitorKubelet.yaml
├── prometheus-serviceMonitorKubeScheduler.yaml
├── prometheus-serviceMonitor.yaml
└── prometheus-service.yaml
0 directories, 47 files
  1. 问题:alertmanager 集群连贯失败

上述命令执行结束后,alertmanager 集群启动失败,报错找不到其它节点:

alertmanager-main-0.alertmanager-operated:9094
alertmanager-main-1.alertmanager-operated:9094
alertmanager-main-2.alertmanager-operated:9094

启动 busygox,用 nslookup 解析一下域名:

kubectl run -i --tty --image busybox:1.28.3 dns-test --restart=Never --rm /bin/sh
# nslookup alertmanager-main-1.alertmanager-operated.monitoring
## 解析失败报错 

域名解析失败,kubernetes 中 coredns 负责域名解析,kube-proxy 负责 endpoint 的保护;coredns 的日志未发现问题,查看 kube-proxy 的 log:

# kubectl logs kube-proxy-krzkc -n kube-system
## 这里有很多谬误
Failed to list IPVS destinations, error: parseIP Error ip [...]
Failed to list IPVS destinations, error: parseIP Error ip [...]
Failed to list IPVS destinations, error: parseIP Error ip [...]
  1. 解决:alertmanager 集群,kube-proxy 版本降级
  • 降级 centos 至 8.2;
  • 升高 kube-proxy;
    这里抉择将 kube-proxy 降级:
# kubectl edit ds kube-proxy -n kube-system
## 批改其镜像
## 由 1.18.0 批改为 1.17.6

image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.6
imagePullPolicy: IfNotPresent
name: kube-proxy

参考:https://blog.csdn.net/cw03192…

退出移动版