在之前的文章中,我们在 k8s 中部署了 consul 生产集群。今天我继续在 k8s 中部署一个 vault 的生产集群。
Vault 可以在高可用性(HA)模式下运行,以通过运行多个 Vault 服务器来防止中断。Vault 通常受存储后端的 IO 限制的约束,而不是受计算要求的约束。某些存储后端(例如 Consul)提供了附加的协调功能,使 Vault 可以在 HA 配置中运行,而其他一些则提供了更强大的备份和还原过程。
在高可用性模式下运行时,Vault 服务器具有两个附加状态:备用和活动状态。在 Vault 群集中,只有一个实例将处于活动状态并处理所有请求(读取和写入),并且所有备用节点都将请求重定向到活动节点。
部署
我们的 consul 集群复用之前文章中部署的 consul 集群。
vault 配置文件 server.hcl 如下:
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "POD_IP:8201"
tls_disable = "true"
}
storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
}
api_addr = "http://POD_IP:8200"
cluster_addr = "https://POD_IP:8201"
接下我们创建 configmap:
kubectl create configmap vault --from-file=server.hcl
大家可以注意到配置文件中的 POD_IP,我们将会在容器启动的时候,sed 替换成真实的 pod 的 IP。
我们采用 StatefulSet 方式部署一个两个节点的 vault 集群。通过 sidecar 的方式将 consul client agent 和 vault 部署到一个 Pod 中。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vault
labels:
app: vault
spec:
serviceName: vault
podManagementPolicy: Parallel
replicas: 3
updateStrategy:
type: OnDelete
selector:
matchLabels:
app: vault
template:
metadata:
labels:
app: vault
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- consul
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vault
topologyKey: kubernetes.io/hostname
containers:
- name: vault
command:
- "/bin/sh"
- "-ec"
args:
- |
sed -E "s/POD_IP/${POD_IP?}/g" /vault/config/server.hcl > /tmp/server.hcl;
/usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/server.hcl
image: "vault:1.4.2"
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add:
- IPC_LOCK
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: VAULT_ADDR
value: "http://127.0.0.1:8200"
- name: VAULT_API_ADDR
value: "http://$(POD_IP):8200"
- name: SKIP_CHOWN
value: "true"
volumeMounts:
- name: vault-config
mountPath: /vault/config/server.hcl
subPath: server.hcl
ports:
- containerPort: 8200
name: vault-port
protocol: TCP
- containerPort: 8201
name: cluster-port
protocol: TCP
readinessProbe:
# Check status; unsealed vault servers return 0
# The exit code reflects the seal status:
# 0 - unsealed
# 1 - error
# 2 - sealed
exec:
command: ["/bin/sh", "-ec", "vault status -tls-skip-verify"]
failureThreshold: 2
initialDelaySeconds: 5
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 5
lifecycle:
# Vault container doesn't receive SIGTERM from Kubernetes
# and after the grace period ends, Kube sends SIGKILL. This
# causes issues with graceful shutdowns such as deregistering itself
# from Consul (zombie services).
preStop:
exec:
command: [
"/bin/sh", "-c",
# Adding a sleep here to give the pod eviction a
# chance to propagate, so requests will not be made
# to this pod while it's terminating"sleep 5 && kill -SIGTERM $(pidof vault)",
]
- name: consul-client
image: consul:1.7.4
env:
- name: GOSSIP_ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: consul
key: gossip-encryption-key
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
args:
- "agent"
- "-advertise=$(POD_IP)"
- "-config-file=/etc/consul/config/client.json"
- "-encrypt=$(GOSSIP_ENCRYPTION_KEY)"
volumeMounts:
- name: consul-config
mountPath: /etc/consul/config
- name: consul-tls
mountPath: /etc/tls
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
volumes:
- name: vault-config
configMap:
defaultMode: 420
name: vault
- name: consul-config
configMap:
defaultMode: 420
name: consul-client
- name: consul-tls
secret:
secretName: consul
如果你的 k8s 集群 pod 网段 flat,可以和 vpc 当中的主机互相访问。那么按照以上的配置即可。否则需要设置 pod 的 hostNetwork: true。
查看部署情况:
kubectl get pods -l app=vault
NAME READY STATUS RESTARTS AGE
vault-0 2/2 Running 0 3m3s
vault-1 2/2 Running 0 3m3s
此时补充一下 consul client agent 的配置文件:
{
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"ca_file": "/etc/tls/ca.pem",
"cert_file": "/etc/tls/consul.pem",
"key_file": "/etc/tls/consul-key.pem",
"data_dir": "/consul/data",
"datacenter": "dc1",
"domain": "cluster.consul",
"server": false,
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"retry_join": [
"prod.discovery-01.xx.sg2.consul",
"prod.discovery-02.xx.sg2.consul",
"prod.discovery-03.xx.sg2.consul"
]
}
prod.discovery-01.xx.sg2.consul 是我们私有域名,分别解析到之前部署的三个 consul 实例。
现在需要初始化和启动每个 Vault 实例
首先 exec 到其中一个 vault 实例:
kubectl exec -it vault-68bcdf8dbc-7gf29 -c vault sh
执行
vault operator init
Unseal Key 1: 4uyvFnGT8WxM7OXXvFJh0ich8W/4yDh27MBBj
Unseal Key 2: RzbrhGbV4hA+MlxkzwtPRP7aGXA3UaK95+5eb
Unseal Key 3: hBIv4GiVkMvrWMDnxoW7m4MAYZqgX/xvwF1KS
Unseal Key 4: +KyBJREqU+1p4qao1red/i7EX0ASmzWP2Ch79
Unseal Key 5: 8v0Q3ZHvMi7QwsJxmH3ay8h7KrJAE3ESgh+qK
Initial Root Token: s.mbHbP3WOWGEpaCT8zaoVl
Vault initialized with 5 key shares and a key threshold of 3. Please securely
distribute the key shares printed above. When the Vault is re-sealed,
restarted, or stopped, you must supply at least 3 of these keys to unseal it
before it can start servicing requests.
Vault does not store the generated master key. Without at least 3 key to
reconstruct the master key, Vault will remain permanently sealed!
It is possible to generate new unseal keys, provided you have a quorum of
existing unseal keys shares. See "vault operator rekey" for more information.
接着使用上面生成的 Unseal Key 去 Unseal 三次:
vault operator unseal <unseal_key_1>
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 1/3
Unseal Nonce 3b5933b9-4120-5dcb-40df-afc8ab9e6563
Version 1.4.2
HA Enabled true
vault operator unseal <unseal_key_2>
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed true
Total Shares 5
Threshold 3
Unseal Progress 2/3
Unseal Nonce 3b5933b9-4120-5dcb-40df-afc8ab9e6563
Version 1.4.2
HA Enabled true
vault operator unseal <unseal_key_3>
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.4.2
Cluster Name vault-cluster-b9554129
Cluster ID e6cedfdd-07d2-520a-9a7c-c4e857803c7e
HA Enabled true
HA Cluster n/a
HA Mode standby
Active Node Address <none>
此时查看 status:
vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.4.2
Cluster Name vault-cluster-b9554129
Cluster ID e6cedfdd-07d2-520a-9a7c-c4e857803c7e
HA Enabled true
HA Cluster https://10.xx.xx.229:8201
HA Mode active
接下来操作另外一个实例,用同样的 key Unseal 三次。
最后查看状态:
vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.4.2
Cluster Name vault-cluster-b9554129
Cluster ID e6cedfdd-07d2-520a-9a7c-c4e857803c7e
HA Enabled true
HA Cluster https://10.xx.3.229:8201
HA Mode standby
Active Node Address http://10.xx.3.229:8200
最后创建 svc:
apiVersion: v1
kind: Service
metadata:
name: vault
labels:
app: vault
spec:
type: ClusterIP
ports:
- port: 8200
targetPort: 8200
protocol: TCP
name: vault
selector:
app: vault
总结
- 对于一些高可用的部署,我们需要加一些反亲和性的设置,比如我们设置了 vault 之间的反亲和性,以及和 consul 的反亲和性。
- 由于我们运行的 1 号进程是 sh,所以我们必须自己通过 preStop 实现优雅退出。