在之前的文章中,我们在k8s中部署了consul 生产集群。今天我继续在k8s中部署一个vault的生产集群。

Vault可以在高可用性(HA)模式下运行,以通过运行多个Vault服务器来防止中断。Vault通常受存储后端的IO限制的约束,而不是受计算要求的约束。某些存储后端(例如Consul)提供了附加的协调功能,使Vault可以在HA配置中运行,而其他一些则提供了更强大的备份和还原过程。

在高可用性模式下运行时,Vault服务器具有两个附加状态:备用和活动状态。在Vault群集中,只有一个实例将处于活动状态并处理所有请求(读取和写入),并且所有备用节点都将请求重定向到活动节点。

部署

我们的consul 集群复用之前文章中部署的consul集群。

vault配置文件server.hcl如下:

listener "tcp" {  address          = "0.0.0.0:8200"  cluster_address  = "POD_IP:8201"  tls_disable      = "true"}storage "consul" {  address = "127.0.0.1:8500"  path    = "vault/"}api_addr = "http://POD_IP:8200"cluster_addr = "https://POD_IP:8201"

接下我们创建configmap:

kubectl create configmap vault  --from-file=server.hcl
大家可以注意到配置文件中的POD_IP,我们将会在容器启动的时候,sed替换成真实的pod的IP。

我们采用StatefulSet方式部署一个两个节点的vault集群。通过sidecar的方式将consul client agent和vault部署到一个Pod中。

apiVersion: apps/v1kind: StatefulSetmetadata:  name: vault  labels:    app: vaultspec:  serviceName: vault  podManagementPolicy: Parallel  replicas: 3  updateStrategy:    type: OnDelete  selector:    matchLabels:      app: vault  template:    metadata:      labels:        app: vault    spec:      affinity:        podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:            - labelSelector:                matchExpressions:                  - key: app                    operator: In                    values:                      - consul              topologyKey: kubernetes.io/hostname        podAntiAffinity:          requiredDuringSchedulingIgnoredDuringExecution:            - labelSelector:                matchExpressions:                  - key: app                    operator: In                    values:                      - vault              topologyKey: kubernetes.io/hostname             containers:      - name: vault        command:          - "/bin/sh"          - "-ec"        args:        - |            sed -E "s/POD_IP/${POD_IP?}/g" /vault/config/server.hcl > /tmp/server.hcl;            /usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/server.hcl        image: "vault:1.4.2"        imagePullPolicy: IfNotPresent        securityContext:          capabilities:            add:              - IPC_LOCK        env:          - name: POD_IP            valueFrom:              fieldRef:                fieldPath: status.podIP          - name: VAULT_ADDR            value: "http://127.0.0.1:8200"          - name: VAULT_API_ADDR            value: "http://$(POD_IP):8200"          - name: SKIP_CHOWN            value: "true"        volumeMounts:          - name: vault-config            mountPath: /vault/config/server.hcl            subPath: server.hcl        ports:        - containerPort: 8200          name: vault-port          protocol: TCP        - containerPort: 8201          name: cluster-port          protocol: TCP        readinessProbe:          # Check status; unsealed vault servers return 0          # The exit code reflects the seal status:          #   0 - unsealed          #   1 - error          #   2 - sealed          exec:            command: ["/bin/sh", "-ec", "vault status -tls-skip-verify"]          failureThreshold: 2          initialDelaySeconds: 5          periodSeconds: 3          successThreshold: 1          timeoutSeconds: 5        lifecycle:          # Vault container doesn't receive SIGTERM from Kubernetes          # and after the grace period ends, Kube sends SIGKILL.  This          # causes issues with graceful shutdowns such as deregistering itself          # from Consul (zombie services).          preStop:            exec:              command: [                "/bin/sh", "-c",                # Adding a sleep here to give the pod eviction a                # chance to propagate, so requests will not be made                # to this pod while it's terminating                "sleep 5 && kill -SIGTERM $(pidof vault)",              ]      - name: consul-client        image: consul:1.7.4        env:          - name: GOSSIP_ENCRYPTION_KEY            valueFrom:              secretKeyRef:                name: consul                key: gossip-encryption-key          - name: POD_IP            valueFrom:              fieldRef:                fieldPath: status.podIP         args:          - "agent"          - "-advertise=$(POD_IP)"          - "-config-file=/etc/consul/config/client.json"          - "-encrypt=$(GOSSIP_ENCRYPTION_KEY)"        volumeMounts:            - name: consul-config              mountPath: /etc/consul/config            - name: consul-tls              mountPath: /etc/tls        lifecycle:            preStop:              exec:                command:                - /bin/sh                - -c      volumes:        - name: vault-config          configMap:            defaultMode: 420            name: vault        - name: consul-config          configMap:            defaultMode: 420            name: consul-client        - name: consul-tls          secret:            secretName: consul
如果你的k8s集群pod网段flat,可以和vpc当中的主机互相访问。那么按照以上的配置即可。否则需要设置pod的hostNetwork: true。

查看部署情况:

kubectl get pods  -l app=vaultNAME                     READY   STATUS    RESTARTS   AGEvault-0   2/2     Running   0          3m3svault-1   2/2     Running   0          3m3s

此时补充一下consul client agent 的配置文件:

    {        "bind_addr": "0.0.0.0",        "client_addr": "0.0.0.0",        "ca_file": "/etc/tls/ca.pem",        "cert_file": "/etc/tls/consul.pem",        "key_file": "/etc/tls/consul-key.pem",        "data_dir": "/consul/data",        "datacenter": "dc1",        "domain": "cluster.consul",        "server": false,        "verify_incoming": true,        "verify_outgoing": true,        "verify_server_hostname": true,        "retry_join": [            "prod.discovery-01.xx.sg2.consul",             "prod.discovery-02.xx.sg2.consul",             "prod.discovery-03.xx.sg2.consul"        ]    }

prod.discovery-01.xx.sg2.consul 是我们私有域名,分别解析到之前部署的三个consul实例。

现在需要初始化和启动每个Vault实例

首先exec到其中一个vault实例:

kubectl exec -it vault-68bcdf8dbc-7gf29  -c vault sh

执行

vault operator initUnseal Key 1: 4uyvFnGT8WxM7OXXvFJh0ich8W/4yDh27MBBjUnseal Key 2: RzbrhGbV4hA+MlxkzwtPRP7aGXA3UaK95+5ebUnseal Key 3: hBIv4GiVkMvrWMDnxoW7m4MAYZqgX/xvwF1KSUnseal Key 4: +KyBJREqU+1p4qao1red/i7EX0ASmzWP2Ch79Unseal Key 5: 8v0Q3ZHvMi7QwsJxmH3ay8h7KrJAE3ESgh+qKInitial Root Token: s.mbHbP3WOWGEpaCT8zaoVlVault initialized with 5 key shares and a key threshold of 3. Please securelydistribute the key shares printed above. When the Vault is re-sealed,restarted, or stopped, you must supply at least 3 of these keys to unseal itbefore it can start servicing requests.Vault does not store the generated master key. Without at least 3 key toreconstruct the master key, Vault will remain permanently sealed!It is possible to generate new unseal keys, provided you have a quorum ofexisting unseal keys shares. See "vault operator rekey" for more information.

接着使用上面生成的Unseal Key 去 Unseal 三次:

vault operator unseal <unseal_key_1>Key                Value---                -----Seal Type          shamirInitialized        trueSealed             trueTotal Shares       5Threshold          3Unseal Progress    1/3Unseal Nonce       3b5933b9-4120-5dcb-40df-afc8ab9e6563Version            1.4.2HA Enabled         truevault operator unseal <unseal_key_2>Key                Value---                -----Seal Type          shamirInitialized        trueSealed             trueTotal Shares       5Threshold          3Unseal Progress    2/3Unseal Nonce       3b5933b9-4120-5dcb-40df-afc8ab9e6563Version            1.4.2HA Enabled         truevault operator unseal <unseal_key_3>Key                    Value---                    -----Seal Type              shamirInitialized            trueSealed                 falseTotal Shares           5Threshold              3Version                1.4.2Cluster Name           vault-cluster-b9554129Cluster ID             e6cedfdd-07d2-520a-9a7c-c4e857803c7eHA Enabled             trueHA Cluster             n/aHA Mode                standbyActive Node Address    <none>

此时查看status:

vault statusKey             Value---             -----Seal Type       shamirInitialized     trueSealed          falseTotal Shares    5Threshold       3Version         1.4.2Cluster Name    vault-cluster-b9554129Cluster ID      e6cedfdd-07d2-520a-9a7c-c4e857803c7eHA Enabled      trueHA Cluster      https://10.xx.xx.229:8201HA Mode         active

接下来操作另外一个实例,用同样的key Unseal 三次。

最后查看状态:

vault statusKey                    Value---                    -----Seal Type              shamirInitialized            trueSealed                 falseTotal Shares           5Threshold              3Version                1.4.2Cluster Name           vault-cluster-b9554129Cluster ID             e6cedfdd-07d2-520a-9a7c-c4e857803c7eHA Enabled             trueHA Cluster             https://10.xx.3.229:8201HA Mode                standbyActive Node Address    http://10.xx.3.229:8200

最后创建svc:

apiVersion: v1kind: Servicemetadata:  name: vault  labels:    app: vaultspec:  type: ClusterIP  ports:    - port: 8200      targetPort: 8200      protocol: TCP      name: vault  selector:    app: vault

总结

  • 对于一些高可用的部署,我们需要加一些反亲和性的设置,比如我们设置了vault之间的反亲和性,以及和consul的反亲和性。
  • 由于我们运行的1号进程是sh,所以我们必须自己通过preStop实现优雅退出。