前言整个k8s诸多组件几乎都是无状态的,所有的数据保存在etcd里,可以说etcd是整个k8s集群的数据库。可想而知,etcd的重要性。因而做好etcd数据备份工作至关重要。这篇主要讲一下我司的相关的实践。备份etcd数据到s3能做etcd的备份方案很多,但是大同小异,基本上都是利用了etcdctl命令来完成。为什么选择s3那?因为我们单位对于aws使用比较多,另外我们希望我们备份到一个高可用的存储中,而不是部署etcd的本机中。此外,s3支持存储的生命周期的设置。设置一下,就可以aws帮助我们定时删除旧数据,保留新的备份数据。具体方案我们基本上用了etcd-backup这个项目,当然也fork了,做了稍微的更改,主要是更改了dockerfile。将etcdctl 修改为我们线上实际的版本。修改之后的dockerfile如下:FROM alpine:3.8RUN apk add –no-cache curl# Get etcdctlENV ETCD_VER=v3.2.24RUN \ cd /tmp && \ curl -L https://storage.googleapis.com/etcd/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz | \ tar xz -C /usr/local/bin –strip-components=1COPY ./etcd-backup /ENTRYPOINT ["/etcd-backup"]CMD ["-h"]之后就是docker build之类了。k8s部署方案选择k8s中的cronjob比较合适,我的备份策略是每三小时备份一次。cronjob.yaml:apiVersion: batch/v1beta1kind: CronJobmetadata: name: etcd-backup namespace: kube-systemspec: schedule: “* */3 * * *” successfulJobsHistoryLimit: 2 failedJobsHistoryLimit: 2 jobTemplate: spec: # Job timeout activeDeadlineSeconds: 300 template: spec: tolerations: # Tolerate master taint - key: node-role.kubernetes.io/master operator: Exists effect: NoSchedule # Container creates etcd backups. # Run container in host network mode on G8s masters # to be able to use 127.0.0.1 as etcd address. # For etcd v2 backups container should have access # to etcd data directory. To achive that, # mount /var/lib/etcd3 as a volume. nodeSelector: node-role.kubernetes.io/master: "" containers: - name: etcd-backup image: iyacontrol/etcd-backup:0.1 args: # backup guest clusters only on production instalations # testing installation can have many broken guest clusters - -prefix=k8s-prod-1 - -etcd-v2-datadir=/var/lib/etcd - -etcd-v3-endpoints=https://172.xx.xx.221:2379,https://172.xx.xx.83:2379,https://172.xx.xx.246:2379 - -etcd-v3-cacert=/certs/ca.crt - -etcd-v3-cert=/certs/server.crt - -etcd-v3-key=/certs/server.key - -aws-s3-bucket=mybucket - -aws-s3-region=us-east-1 volumeMounts: - mountPath: /var/lib/etcd name: etcd-datadir - mountPath: /certs name: etcd-certs env: - name: ETCDBACKUP_AWS_ACCESS_KEY valueFrom: secretKeyRef: name: etcd-backup key: ETCDBACKUP_AWS_ACCESS_KEY - name: ETCDBACKUP_AWS_SECRET_KEY valueFrom: secretKeyRef: name: etcd-backup key: ETCDBACKUP_AWS_SECRET_KEY - name: ETCDBACKUP_PASSPHRASE valueFrom: secretKeyRef: name: etcd-backup key: ETCDBACKUP_PASSPHRASE volumes: - name: etcd-datadir hostPath: path: /var/lib/etcd - name: etcd-certs hostPath: path: /etc/kubernetes/pki/etcd/ # Do not restart pod, job takes care on restarting failed pod. restartPolicy: Never hostNetwork: true 注意:容忍 和 nodeselector配合,让pod调度到master节点上。然后secret.yaml:apiVersion: v1kind: Secretmetadata: name: etcd-backup namespace: kube-systemtype: Opaquedata: ETCDBACKUP_AWS_ACCESS_KEY: QUtJTI0TktCT0xQRlEK ETCDBACKUP_AWS_SECRET_KEY: aXJ6eThjQnM2MVRaSkdGMGxDeHhoeFZNUDU4ZGRNbgo= ETCDBACKUP_PASSPHRASE: ““总结之前我们尝试过,etcd-operator来完成backup。实际使用过程中,发现并不好,概念很多,组件复杂,代码很多写法太死。最后选择etcd-backup。主要是因为简单,less is more。看源码,用golang编写,扩展自己的一些需求,也比较简单。