次要参考KubeSpray我的项目对prometheus-operator的部署流程,尝试手工部署prometheus-operator。

kubeproary部署prometheus-opeartor的流程

部署流程:

  • 部署promethues-operator deploy;
  • 部署prometheus的其它组件, 如node-exporter、kube-state-metrics;
# cat tasks/prometheus.yml---- name: Kubernetes Apps | Make sure {{ prometheus_config_dir }} exists  file:    path: "{{ prometheus_config_dir }}"    state: directory- name: Kubernetes Apps | Render templates for Prometheus-operator-deployment  template:    src: "{{ item}}.yaml.j2"    dest: "{{ prometheus_config_dir }}/{{ item }}.yaml"  with_items:    - prometheus-operator-deployment- name: copy prometheus operators to {{ kube_config_dir }}  copy:    src: "{{ item }}.yaml"    dest: "{{ prometheus_config_dir }}/{{ item }}.yaml"  with_items:    - 0namespace-namespace    - prometheus-operator-0alertmanagerCustomResourceDefinition    - prometheus-operator-0podmonitorCustomResourceDefinition    - prometheus-operator-0prometheusCustomResourceDefinition    - prometheus-operator-0prometheusruleCustomResourceDefinition    - prometheus-operator-0servicemonitorCustomResourceDefinition    - prometheus-operator-0thanosrulerCustomResourceDefinition    - prometheus-operator-clusterRoleBinding    - prometheus-operator-clusterRole    - prometheus-operator-serviceAccount    - prometheus-operator-service    - prometheus-rules- name: Kubernetes Apps | apply prometheus-operator  kube:    kubectl: "{{ bin_dir }}/kubectl"    filename: "{{ prometheus_config_dir }}/{{ item }}.yaml"    state: "latest"  register: result  until: result is succeeded  retries: 10  delay: 6  with_items: "{{ prometheus_operators }}"- name: Kubernetes Apps | Render templates for Prometheus  template:    src: "{{ item}}.yaml.j2"    dest: "{{ prometheus_config_dir }}/{{ item }}.yaml"  register: prometheus_reg  with_items:    - alertmanager-alertmanager    - alertmanager-secret    - alertmanager-serviceAccount    - alertmanager-serviceMonitor    - alertmanager-service    - kube-state-metrics-clusterRoleBinding    - kube-state-metrics-clusterRole    - kube-state-metrics-deployment    - kube-state-metrics-serviceAccount    - kube-state-metrics-serviceMonitor    - kube-state-metrics-service    - node-exporter-clusterRoleBinding    - node-exporter-clusterRole    - node-exporter-daemonset    - node-exporter-serviceAccount    - node-exporter-serviceMonitor    - node-exporter-service    - prometheus-adapter-apiService    - prometheus-adapter-clusterRoleAggregatedMetricsReader    - prometheus-adapter-clusterRoleBindingDelegator    - prometheus-adapter-clusterRoleBinding    - prometheus-adapter-clusterRoleServerResources    - prometheus-adapter-clusterRole    - prometheus-adapter-configMap    - prometheus-adapter-deployment    - prometheus-adapter-roleBindingAuthReader    - prometheus-adapter-serviceAccount    - prometheus-adapter-serviceMonitor    - prometheus-adapter-service    - prometheus-clusterRoleBinding    - prometheus-clusterRole    - prometheus-kubeControllerManagerPrometheusDiscoveryService    - prometheus-kubeSchedulerPrometheusDiscoveryService    - prometheus-operator-serviceMonitor    - prometheus-prometheus    - prometheus-roleBindingConfig    - prometheus-roleBindingSpecificNamespaces    - prometheus-roleConfig    - prometheus-roleSpecificNamespaces    - prometheus-serviceAccount    - prometheus-serviceMonitorApiserver    - prometheus-serviceMonitorCoreDNS    - prometheus-serviceMonitorKubeControllerManager    - prometheus-serviceMonitorKubelet    - prometheus-serviceMonitorKubeScheduler    - prometheus-serviceMonitor    - prometheus-service- name: Kubernetes Apps | Add policies, roles, bindings for Prometheus  kube:    kubectl: "{{ bin_dir }}/kubectl"    filename: "{{ prometheus_config_dir }}/{{ item.item }}.yaml"    state: "latest"  register: result  until: result is succeeded  retries: 10  delay: 6  with_items: "{{ prometheus_reg.results }}"

手工部署prometheus-operator

  1. 提前给master-node打tag

因为prometheus抉择部署在master节点上

kubectl label nodes k8s-master node-role.kubernetes.io/master=
  1. 部署prometheus-operator deploy
kubectl create -f .//文件列表[root@k8s-master prometheus]# tree ./operator/./operator/├── 0namespace-namespace.yaml├── prometheus-operator-0alertmanagerCustomResourceDefinition.yaml├── prometheus-operator-0podmonitorCustomResourceDefinition.yaml├── prometheus-operator-0prometheusCustomResourceDefinition.yaml├── prometheus-operator-0prometheusruleCustomResourceDefinition.yaml├── prometheus-operator-0servicemonitorCustomResourceDefinition.yaml├── prometheus-operator-0thanosrulerCustomResourceDefinition.yaml├── prometheus-operator-clusterRoleBinding.yaml├── prometheus-operator-clusterRole.yaml├── prometheus-operator-deployment.yaml├── prometheus-operator-serviceAccount.yaml├── prometheus-operator-service.yaml└── prometheus-rules.yaml0 directories, 13 files
  1. 部署prometheus其它组件
kubectl create -f .//文件列表[root@k8s-master prometheus]# tree ./prometheus/./prometheus/├── alertmanager-alertmanager.yaml├── alertmanager-secret.yaml├── alertmanager-serviceAccount.yaml├── alertmanager-serviceMonitor.yaml├── alertmanager-service.yaml├── kube-state-metrics-clusterRoleBinding.yaml├── kube-state-metrics-clusterRole.yaml├── kube-state-metrics-deployment.yaml├── kube-state-metrics-serviceAccount.yaml├── kube-state-metrics-serviceMonitor.yaml├── kube-state-metrics-service.yaml├── node-exporter-clusterRoleBinding.yaml├── node-exporter-clusterRole.yaml├── node-exporter-daemonset.yaml├── node-exporter-serviceAccount.yaml├── node-exporter-serviceMonitor.yaml├── node-exporter-service.yaml├── prometheus-adapter-apiService.yaml├── prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml├── prometheus-adapter-clusterRoleBindingDelegator.yaml├── prometheus-adapter-clusterRoleBinding.yaml├── prometheus-adapter-clusterRoleServerResources.yaml├── prometheus-adapter-clusterRole.yaml├── prometheus-adapter-configMap.yaml├── prometheus-adapter-deployment.yaml├── prometheus-adapter-roleBindingAuthReader.yaml├── prometheus-adapter-serviceAccount.yaml├── prometheus-adapter-serviceMonitor.yaml├── prometheus-adapter-service.yaml├── prometheus-clusterRoleBinding.yaml├── prometheus-clusterRole.yaml├── prometheus-kubeControllerManagerPrometheusDiscoveryService.yaml├── prometheus-kubeSchedulerPrometheusDiscoveryService.yaml├── prometheus-operator-serviceMonitor.yaml├── prometheus-prometheus.yaml├── prometheus-roleBindingConfig.yaml├── prometheus-roleBindingSpecificNamespaces.yaml├── prometheus-roleConfig.yaml├── prometheus-roleSpecificNamespaces.yaml├── prometheus-serviceAccount.yaml├── prometheus-serviceMonitorApiserver.yaml├── prometheus-serviceMonitorCoreDNS.yaml├── prometheus-serviceMonitorKubeControllerManager.yaml├── prometheus-serviceMonitorKubelet.yaml├── prometheus-serviceMonitorKubeScheduler.yaml├── prometheus-serviceMonitor.yaml└── prometheus-service.yaml0 directories, 47 files
  1. 问题:alertmanager集群连贯失败

上述命令执行结束后,alertmanager集群启动失败,报错找不到其它节点:

alertmanager-main-0.alertmanager-operated:9094alertmanager-main-1.alertmanager-operated:9094alertmanager-main-2.alertmanager-operated:9094

启动busygox,用nslookup解析一下域名:

kubectl run -i --tty --image busybox:1.28.3 dns-test --restart=Never --rm /bin/sh# nslookup alertmanager-main-1.alertmanager-operated.monitoring## 解析失败报错

域名解析失败,kubernetes中coredns负责域名解析,kube-proxy负责endpoint的保护;coredns的日志未发现问题,查看kube-proxy的log:

# kubectl logs kube-proxy-krzkc -n kube-system## 这里有很多谬误Failed to list IPVS destinations, error: parseIP Error ip [...]Failed to list IPVS destinations, error: parseIP Error ip [...]Failed to list IPVS destinations, error: parseIP Error ip [...]
  1. 解决:alertmanager集群,kube-proxy版本降级
  • 降级centos至8.2;
  • 升高kube-proxy;
    这里抉择将kube-proxy降级:
# kubectl edit ds kube-proxy -n kube-system## 批改其镜像## 由1.18.0批改为1.17.6image: registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.17.6imagePullPolicy: IfNotPresentname: kube-proxy

参考:https://blog.csdn.net/cw03192...