关于kubernetes:k8s加入新的master节点出现etcd检查失败

6次阅读

共计 1921 个字符,预计需要花费 5 分钟才能阅读完成。

背景:

昨天在建设好新的集群后,呈现了新的问题,其中的一台 master 节点无奈失常工作。尽管能够失常应用,然而就呈现了单点故障,明天在修复时呈现了 etcd 健康检查自检没通过。

Yesterday, after a new cluster was established, a new problem a problem occurred, and one of the master nodes did not work properly. Although can be used normally, but there is a single point of failure, today in the repair of the etcd health check self-test failed.

对退出集群中时,呈现如下报错:

When you join a cluster, the following error occurs

     提醒 etcd 监控查看失败,查看一下 Kubernetes 集群中的 kubeadm 配置信息。

Prompt the etcd monitoring check to fail and review the kubeadm configuration information in the Kubernetes cluster.


\[root@master-01 ~\]# kubectl describe configmaps kubeadm-config -n kube-system
----
apiEndpoints:
  master-01:
    advertiseAddress: 10.0.0.11
    bindPort: 6443
  master-02:
    advertiseAddress: 10.0.0.12
    bindPort: 6443
  master-03:
    advertiseAddress: 10.0.0.13
    bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus

Events:  <none>

  因为集群搭建的时候,etcd 是镜像的形式,在 master02 下面呈现问题后,进行剔除实现后,etcd 还是在存储在每个 master 下面,所以从新增加的时候会得悉健康检查失败。

Because when the cluster is built, etcd is mirrored, after the problem on master02, after the cull is completed, etcd is still stored on top of each master, so when you add again, you will learn that the health check failed.


这时就须要进入容器外部进行手动删除这个 etcd 了,首先获取集群中的 etcd pod 列表看一下,并进入外部给一个 sh 窗口。

At this point you need to go inside the container to manually delete this etcd, first get the list of etcd pods in the cluster to see, and go inside to give a sh window

\[root@master-01 ~\]# kubectl get pods -n kube-system | grep etcd
\[root@master-01 ~\]# kubectl exec -it etcd-master-03 sh -n kube-system

     进入容器后,执行如下操作

    After entering the container, do the following

\## 配置环境
$ export ETCDCTL\_API=3
$ alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'

## 查看 etcd 集群成员列表
$ etcdctl member list

## 删除 etcd 集群成员 master-02
$ etcdctl member remove 

## 再次查看 etcd 集群成员列表
$ etcdctl member list

## 退出容器
$ exit

查看列表并删除已不存在的 master

View the list and remove the master that no longer exists


再次进行退出 master,即可胜利。

Join master again and you’ll be successful



高新科技园

正文完
 0