ETCD recover solution

47次阅读

共计 4680 个字符,预计需要花费 12 分钟才能阅读完成。

Case of unreachable member
A cluster with etcd containers is created successfully, for example, one of their creation config files looks like:
{
“repository_id”: “Etcd”,
“image_tag”: “0.2.0-guoxiang”,
“name”: “etcd-re-create-guoxiang”,
“hostname”: “hostname1”,
“datapool”: [{
“mount_id”: “etcd_data”,
“mount”: “/data”,
“quotagroup”: “etcd-guoxiang-05”,
“size”: “600”,
“size_unit”: “MB”,
“filesystem”: “xfs”
}],
“ports”: {
“2379/tcp”: “3379”,
“2380/tcp”: “3380”
},
“env”: {
“IP”: “10.23.2.109”,
“ETCD_ADVERTISE_CLIENT_URLS”: “https://10.23.2.109:3379”,
“ETCD_INITIAL_ADVERTISE_PEER_URLS”: “https://10.23.2.109:3380”,
“ETCD_INITIAL_CLUSTER”: “hostname1=https://10.23.2.109:3380,hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380”,
“DATA_CENTER”: “us-south”,
“PROFILE”: “production”
}
}
We can check the cluster status with the following command.Tips: You should get client-key.pem, client.pem and ca.pem before you run the command.
# etcdctl –endpoint https://10.23.2.109:3379,https://10.23.2.108:3379,https://10.23.2.110:3379 –key-file ./client-key.pem –cert-file ./client.pem –ca-file ./ca.pem cluster-health
If the cluster is running normally, the output looks like:
member 5e9e9b11fe249dad is healthy: got healthy result from https://10.23.2.109:3379
member 685a3bee5e91c225 is healthy: got healthy result from https://10.23.2.108:3379
member eea45c825bf56feb is healthy: got healthy result from https://10.23.2.110:3379
cluster is healthy
If one member failed, the output may look like:
failed to check the health of member 5e9e9b11fe249dad on https://10.23.2.109:3379: Get https://10.23.2.109:3379/health: dial tcp 10.23.2.109:3379: connect: connection refused
member 5e9e9b11fe249dad is unreachable: [https://10.23.2.109:3379] are all unreachable
member 685a3bee5e91c225 is healthy: got healthy result from https://10.23.2.108:3379
member eea45c825bf56feb is healthy: got healthy result from https://10.23.2.110:3379
cluster is healthy
The reason may meet one of the following four cases.
Case 1: The whole environment of an etcd container was destroyed.
Solution
Remove the destroyed member with etcdctl.
# etcdctl –endpoint https://10.23.2.109:3379,https://10.23.2.108:3379,https://10.23.2.110:3379 –key-file ./client2-key.pem –cert-file ./client2.pem –ca-file ./ca.pem member remove 5e9e9b11fe249dad
5e9e9b11fe249dad is memberID of the unreachable member.
Create a new etcd container with adding the following environment variables to env in config file.
“ETCD_INITIAL_CLUSTER_STATE”: “existing”
“ETCD_INITIAL_CLUSTER”: <The cluster peer urls with the new etcd container>
An example is as follows.
{
“repository_id”: “Etcd”,
“image_tag”: “0.2.0-guoxiang”,
“name”: “etcd-re-create-guoxiang”,
“hostname”: “hostname1”,
“datapool”: [{
“mount_id”: “etcd_data”,
“mount”: “/data”,
“quotagroup”: “etcd-guoxiang-05”,
“size”: “600”,
“size_unit”: “MB”,
“filesystem”: “xfs”
}],
“ports”: {
“2379/tcp”: “3379”,
“2380/tcp”: “3380”
},
“env”: {
“LPAR_IP”: “10.23.2.109”,
“ETCD_ADVERTISE_CLIENT_URLS”: “https://10.23.2.109:3379”,
“ETCD_INITIAL_ADVERTISE_PEER_URLS”: “https://10.23.2.109:3380”,
“ETCD_INITIAL_CLUSTER”: “hostname1=https://10.23.2.109:3380,hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380”,
“ETCD_INITIAL_CLUSTER_STATE”: “existing”,
“DATA_CENTER”: “us-south”,
“PROFILE”: “production”
}
}
“hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380” in ETCD_INITIAL_CLUSTER are the peer urls of the cluster after removing the destroyed member.
Add the new container to the existing cluster.
# etcdctl –endpoint https://10.23.2.109:3379,https://10.23.2.108:3379,https://10.23.2.110:3379 –key-file ./client2-key.pem –cert-file ./client2.pem –ca-file ./ca.pem member add <name> <peerURL>
<name> is hostname in its config file.
<peerURL> is one of ETCD_INITIAL_ADVERTISE_PEER_URLS in its config file.
Case 2: The etcd container doesn’t exist.
Solution
Add “ETCD_INITIAL_CLUSTER_STATE”: “existing” to the container creation config file. An example is as follows.
{
“repository_id”: “DBaaSEtcd”,
“image_tag”: “0.2.0-guoxiang”,
“name”: “etcd-re-create-guoxiang”,
“hostname”: “hostname1”,
“datapool”: [{
“mount_id”: “etcd_data”,
“mount”: “/data”,
“quotagroup”: “etcd-guoxiang-05”,
“size”: “600”,
“size_unit”: “MB”,
“filesystem”: “xfs”
}],
“ports”: {
“2379/tcp”: “3379”,
“2380/tcp”: “3380”
},
“env”: {
“LPAR_IP”: “10.23.2.109”,
“ETCD_ADVERTISE_CLIENT_URLS”: “https://10.23.2.109:3379”,
“ETCD_INITIAL_ADVERTISE_PEER_URLS”: “https://10.23.2.109:3380”,
“ETCD_INITIAL_CLUSTER”: “hostname1=https://10.23.2.109:3380,hostname2=https://10.23.2.108:3380,hostname3=https://10.23.2.110:3380”,
“ETCD_INITIAL_CLUSTER_STATE”: “existing”,
“DATA_CENTER”: “us-south”,
“PROFILE”: “production”
}
}
Create the container with the new config file, but keep the other configurations as same as before.
Case 3: The etcd container was stopped.
Solution
Start the container.
# docker start <container>
Case 4: The etcd service was stopped in its container.
Solution
Restart the stopped etcd container.
# docker restart <container>
Case of unhealthy member
If a member is unhealthy, we can refer to above case 2 to remove its container with metadata, then create a new one to fix it.

正文完
 0