Case of unreachable memberA cluster with etcd containers is created successfully, for example, one of their creation config files looks like:{ “repository_id”: “Etcd”, “image_tag”: “0.2.0-guoxiang”, “name”: “etcd-re-create-guoxiang”, “hostname”: “hostname1”, “datapool”: [{ “mount_id”: “etcd_data”, “mount”: “/data”, “quotagroup”: “etcd-guoxiang-05”, “size”: “600”, “size_unit”: “MB”, “filesystem”: “xfs” }], “ports”: { “2379/tcp”: “3379”, “2380/tcp”: “3380” }, “env”: { “IP”: “10.23.2.109”, “ETCD_ADVERTISE_CLIENT_URLS”: “https://10.23.2.109:3379”, “ETCD_INITIAL_ADVERTISE_PEER_URLS”: “https://10.23.2.109:3380”, “ETCD_INITIAL_CLUSTER”: “hostname1=https://10.23.2.109:3380,hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380”, “DATA_CENTER”: “us-south”, “PROFILE”: “production” }}We can check the cluster status with the following command.Tips: You should get client-key.pem, client.pem and ca.pem before you run the command.# etcdctl –endpoint https://10.23.2.109:3379,https://10.23.2.108:3379,https://10.23.2.110:3379 –key-file ./client-key.pem –cert-file ./client.pem –ca-file ./ca.pem cluster-healthIf the cluster is running normally, the output looks like:member 5e9e9b11fe249dad is healthy: got healthy result from https://10.23.2.109:3379member 685a3bee5e91c225 is healthy: got healthy result from https://10.23.2.108:3379member eea45c825bf56feb is healthy: got healthy result from https://10.23.2.110:3379cluster is healthyIf one member failed, the output may look like:failed to check the health of member 5e9e9b11fe249dad on https://10.23.2.109:3379: Get https://10.23.2.109:3379/health: dial tcp 10.23.2.109:3379: connect: connection refusedmember 5e9e9b11fe249dad is unreachable: [https://10.23.2.109:3379] are all unreachablemember 685a3bee5e91c225 is healthy: got healthy result from https://10.23.2.108:3379member eea45c825bf56feb is healthy: got healthy result from https://10.23.2.110:3379cluster is healthyThe reason may meet one of the following four cases.Case 1: The whole environment of an etcd container was destroyed.SolutionRemove the destroyed member with etcdctl.# etcdctl –endpoint https://10.23.2.109:3379,https://10.23.2.108:3379,https://10.23.2.110:3379 –key-file ./client2-key.pem –cert-file ./client2.pem –ca-file ./ca.pem member remove 5e9e9b11fe249dad5e9e9b11fe249dad is memberID of the unreachable member.Create a new etcd container with adding the following environment variables to env in config file.“ETCD_INITIAL_CLUSTER_STATE”: “existing"“ETCD_INITIAL_CLUSTER”: <The cluster peer urls with the new etcd container>An example is as follows.{ “repository_id”: “Etcd”, “image_tag”: “0.2.0-guoxiang”, “name”: “etcd-re-create-guoxiang”, “hostname”: “hostname1”, “datapool”: [{ “mount_id”: “etcd_data”, “mount”: “/data”, “quotagroup”: “etcd-guoxiang-05”, “size”: “600”, “size_unit”: “MB”, “filesystem”: “xfs” }], “ports”: { “2379/tcp”: “3379”, “2380/tcp”: “3380” }, “env”: { “LPAR_IP”: “10.23.2.109”, “ETCD_ADVERTISE_CLIENT_URLS”: “https://10.23.2.109:3379”, “ETCD_INITIAL_ADVERTISE_PEER_URLS”: “https://10.23.2.109:3380”, “ETCD_INITIAL_CLUSTER”: “hostname1=https://10.23.2.109:3380,hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380”, “ETCD_INITIAL_CLUSTER_STATE”: “existing”, “DATA_CENTER”: “us-south”, “PROFILE”: “production” }}“hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380” in ETCD_INITIAL_CLUSTER are the peer urls of the cluster after removing the destroyed member.Add the new container to the existing cluster.# etcdctl –endpoint https://10.23.2.109:3379,https://10.23.2.108:3379,https://10.23.2.110:3379 –key-file ./client2-key.pem –cert-file ./client2.pem –ca-file ./ca.pem member add <name> <peerURL><name> is hostname in its config file.<peerURL> is one of ETCD_INITIAL_ADVERTISE_PEER_URLS in its config file.Case 2: The etcd container doesn’t exist.SolutionAdd “ETCD_INITIAL_CLUSTER_STATE”: “existing” to the container creation config file. An example is as follows.{ “repository_id”: “DBaaSEtcd”, “image_tag”: “0.2.0-guoxiang”, “name”: “etcd-re-create-guoxiang”, “hostname”: “hostname1”, “datapool”: [{ “mount_id”: “etcd_data”, “mount”: “/data”, “quotagroup”: “etcd-guoxiang-05”, “size”: “600”, “size_unit”: “MB”, “filesystem”: “xfs” }], “ports”: { “2379/tcp”: “3379”, “2380/tcp”: “3380” }, “env”: { “LPAR_IP”: “10.23.2.109”, “ETCD_ADVERTISE_CLIENT_URLS”: “https://10.23.2.109:3379”, “ETCD_INITIAL_ADVERTISE_PEER_URLS”: “https://10.23.2.109:3380”, “ETCD_INITIAL_CLUSTER”: “hostname1=https://10.23.2.109:3380,hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380”, “ETCD_INITIAL_CLUSTER_STATE”: “existing”, “DATA_CENTER”: “us-south”, “PROFILE”: “production” }}Create the container with the new config file, but keep the other configurations as same as before.Case 3: The etcd container was stopped.SolutionStart the container.# docker start <container>Case 4: The etcd service was stopped in its container.SolutionRestart the stopped etcd container.# docker restart <container>Case of unhealthy memberIf a member is unhealthy, we can refer to above case 2 to remove its container with metadata, then create a new one to fix it.
...