简介：作为CNCF成员，Weave Flagger提供了继续集成和继续交付的各项能力。Flagger将渐进式公布总结为3类： - 灰度公布/金丝雀公布(Canary)：用于渐进式切流到灰度版本(progressive traffic shifting) - A/B测试(A/B Testing)：用于依据申请信息将
作为CNCF成员，Weave Flagger提供了继续集成和继续交付的各项能力。Flagger将渐进式公布总结为3类：

灰度公布/金丝雀公布(Canary)：用于渐进式切流到灰度版本(progressive traffic shifting)
A/B测试(A/B Testing)：用于依据申请信息将用户申请路由到A/B版本(HTTP headers and cookies traffic routing)
蓝绿公布(Blue/Green)：用于流量切换和流量复制 (traffic switching and mirroring)
本篇将介绍Flagger on ASM的渐进式灰度公布实际。

Setup Flagger
1 部署Flagger
执行如下命令部署flagger(残缺脚本参见：demo_canary.sh)。

alias k="kubectl --kubeconfig $USER_CONFIG"
alias h="helm --kubeconfig $USER_CONFIG"

cp $MESH_CONFIG kubeconfig
k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig
k -n istio-system label secret istio-kubeconfig istio/multiCluster=true

h repo add flagger https://flagger.app
h repo update
k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml
h upgrade -i flagger flagger/flagger --namespace=istio-system \

--set crd.create=false \--set meshProvider=istio \--set metricsServer=http://prometheus:9090 \--set istio.kubeconfig.secretName=istio-kubeconfig \--set istio.kubeconfig.key=kubeconfig

2 部署Gateway
在灰度公布过程中，Flagger会申请ASM更新用于灰度流量配置的VirtualService，这个VirtualService会应用到命名为public-gateway的Gateway。为此咱们创立相干Gateway配置文件public-gateway.yaml如下：

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: public-gateway
namespace: istio-system
spec:
selector:

istio: ingressgateway

servers:

- port:    number: 80    name: http    protocol: HTTP  hosts:    - "*"

执行如下命令部署Gateway：

kubectl --kubeconfig "$MESH_CONFIG" apply -f resources_canary/public-gateway.yaml
3 部署flagger-loadtester
flagger-loadtester是灰度公布阶段，用于探测灰度POD实例的利用。

执行如下命令部署flagger-loadtester：

kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"
4 部署PodInfo及其HPA
咱们首先应用Flagger发行版自带的HPA配置(这是一个运维级的HPA)，待实现残缺流程后，咱们再应用利用级的HPA。

执行如下命令部署PodInfo及其HPA：

kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"
渐进式灰度公布
1 部署Canary
Canary是基于Flagger进行灰度公布的外围CRD，详见How it works。咱们首先部署如下Canary配置文件podinfo-canary.yaml，实现残缺的渐进式灰度流程，而后在此基础上引入利用维度的监控指标，来进一步实现利用有感知的渐进式灰度公布。

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfo
namespace: test
spec:
# deployment reference
targetRef:

apiVersion: apps/v1kind: Deploymentname: podinfo

# the maximum time in seconds for the canary deployment
# to make progress before it is rollback (default 600s)
progressDeadlineSeconds: 60
# HPA reference (optional)
autoscalerRef:

apiVersion: autoscaling/v2beta2kind: HorizontalPodAutoscalername: podinfo

service:

# service port numberport: 9898# container port number or name (optional)targetPort: 9898# Istio gateways (optional)gateways:- public-gateway.istio-system.svc.cluster.local# Istio virtual service host names (optional)hosts:- '*'# Istio traffic policy (optional)trafficPolicy:  tls:    # use ISTIO_MUTUAL when mTLS is enabled    mode: DISABLE# Istio retry policy (optional)retries:  attempts: 3  perTryTimeout: 1s  retryOn: "gateway-error,connect-failure,refused-stream"

analysis:

# schedule interval (default 60s)interval: 1m# max number of failed metric checks before rollbackthreshold: 5# max traffic percentage routed to canary# percentage (0-100)maxWeight: 50# canary increment step# percentage (0-100)stepWeight: 10metrics:- name: request-success-rate  # minimum req success rate (non 5xx responses)  # percentage (0-100)  thresholdRange:    min: 99  interval: 1m- name: request-duration  # maximum req duration P99  # milliseconds  thresholdRange:    max: 500  interval: 30s# testing (optional)webhooks:  - name: acceptance-test    type: pre-rollout    url: http://flagger-loadtester.test/    timeout: 30s    metadata:      type: bash      cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"  - name: load-test    url: http://flagger-loadtester.test/    timeout: 5s    metadata:      cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

执行如下命令部署Canary：

kubectl --kubeconfig "$USER_CONFIG" apply -f resources_canary/podinfo-canary.yaml
部署Canary后，Flagger会将名为podinfo的Deployment复制为podinfo-primary，并将podinfo-primary扩容至HPA定义的最小POD数量。而后逐渐将名为podinfo的这个Deployment的POD数量将缩容至0。也就是说，podinfo将作为灰度版本的Deployment，podinfo-primary将作为生产版本的Deployment。

同时，创立3个服务——podinfo、podinfo-primary和podinfo-canary，前两者指向podinfo-primary这个Deployment，最初者指向podinfo这个Deployment。

2 降级podinfo
执行如下命令，将灰度Deployment的版本从3.1.0降级到3.1.1：

kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1
3 渐进式灰度公布
此时，Flagger将开始执行如本系列第一篇所述的渐进式灰度公布流程，这里再简述次要流程如下：

逐渐扩容灰度POD、验证
渐进式切流、验证
滚动降级生产Deployment、验证
100%切回生产
缩容灰度POD至0
咱们能够通过如下命令察看这个渐进式切流的过程：

while true; do kubectl --kubeconfig "$USER_CONFIG" -n test describe canary/podinfo; sleep 10s;done
输入的日志信息示意如下：

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Synced 39m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
Normal Synced 38m (x2 over 39m) flagger all the metrics providers are available!
Normal Synced 38m flagger Initialization done! podinfo.test
Normal Synced 37m flagger New revision detected! Scaling up podinfo.test
Normal Synced 36m flagger Starting canary analysis for podinfo.test
Normal Synced 36m flagger Pre-rollout check acceptance-test passed
Normal Synced 36m flagger Advance podinfo.test canary weight 10
Normal Synced 35m flagger Advance podinfo.test canary weight 20
Normal Synced 34m flagger Advance podinfo.test canary weight 30
Normal Synced 33m flagger Advance podinfo.test canary weight 40
Normal Synced 29m (x4 over 32m) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
相应的Kiali视图(可选)，如下图所示：

到此，咱们实现了一个残缺的渐进式灰度公布流程。如下是扩大浏览。

灰度中的利用级扩缩容
在实现上述渐进式灰度公布流程的根底上，咱们接下来再来看上述Canary配置中，对于HPA的配置。

autoscalerRef:

apiVersion: autoscaling/v2beta2kind: HorizontalPodAutoscalername: podinfo

这个名为podinfo的HPA是Flagger自带的配置，当灰度Deployment的CPU利用率达到99%时扩容。残缺配置如下：

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
spec:
scaleTargetRef:

apiVersion: apps/v1kind: Deploymentname: podinfo

minReplicas: 2
maxReplicas: 4
metrics:

- type: Resource  resource:    name: cpu    target:      type: Utilization      # scale up if usage is above      # 99% of the requested CPU (100m)      averageUtilization: 99

咱们在后面一篇中讲述了利用级扩缩容的实际，在此，咱们将其利用于灰度公布的过程中。

1 感知利用QPS的HPA
执行如下命令部署感知利用申请数量的HPA，实现在QPS达到10时进行扩容(残缺脚本参见：advanced_canary.sh)：

kubectl --kubeconfig "$USER_CONFIG" apply -f resources_hpa/requests_total_hpa.yaml
相应地，Canary配置更新为：

autoscalerRef:

apiVersion: autoscaling/v2beta2kind: HorizontalPodAutoscalername: podinfo-total

2 降级podinfo
执行如下命令，将灰度Deployment的版本从3.1.0降级到3.1.1：

kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1
3 验证渐进式灰度公布及HPA
命令察看这个渐进式切流的过程：

while true; do k -n test describe canary/podinfo; sleep 10s;done
在渐进式灰度公布过程中(在呈现Advance podinfo.test canary weight 10信息后，见下图)，咱们应用如下命令，从入口网关发动申请以减少QPS：

INGRESS_GATEWAY=$(kubectl --kubeconfig $USER_CONFIG -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
hey -z 20m -c 2 -q 10 http://$INGRESS_GATEWAY
应用如下命令察看渐进式灰度公布进度：

watch kubectl --kubeconfig $USER_CONFIG get canaries --all-namespaces
应用如下命令察看hpa的正本数变动：

watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total
后果如下图所示，在渐进式灰度公布过程中，当切流到30%的某一时刻，灰度Deployment的正本数为4：

灰度中的利用级监控指标
在实现上述灰度中的利用级扩缩容的根底上，最初咱们再来看上述Canary配置中，对于metrics的配置：

analysis:

metrics:- name: request-success-rate  # minimum req success rate (non 5xx responses)  # percentage (0-100)  thresholdRange:    min: 99  interval: 1m- name: request-duration  # maximum req duration P99  # milliseconds  thresholdRange:    max: 500  interval: 30s# testing (optional)

1 Flagger内置监控指标
到目前为止，Canary中应用的metrics配置始终是Flagger的两个内置监控指标：申请成功率(request-success-rate)和申请提早(request-duration)。如下图所示，Flagger中不同平台对内置监控指标的定义，其中，istio应用的是本系列第一篇介绍的Mixerless Telemetry相干的遥测数据。

2 自定义监控指标
为了展现灰度公布过程中，遥测数据为验证灰度环境带来的更多灵活性，咱们再次以istio_requests_total为例，创立一个名为not-found-percentage的MetricTemplate，统计申请返回404错误码的数量占申请总数的比例。

配置文件metrics-404.yaml如下(残缺脚本参见：advanced_canary.sh)：

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
name: not-found-percentage
namespace: istio-system
spec:
provider:

type: prometheusaddress: http://prometheus.istio-system:9090

query: |

100 - sum(    rate(        istio_requests_total{          reporter="destination",          destination_workload_namespace="{{ namespace }}",          destination_workload="{{ target }}",          response_code!="404"        }[{{ interval }}]    ))/sum(    rate(        istio_requests_total{          reporter="destination",          destination_workload_namespace="{{ namespace }}",          destination_workload="{{ target }}"        }[{{ interval }}]    )) * 100

执行如下命令创立上述MetricTemplate：

k apply -f resources_canary2/metrics-404.yaml
相应地，Canary中metrics的配置更新为：

analysis:

metrics:  - name: "404s percentage"    templateRef:      name: not-found-percentage      namespace: istio-system    thresholdRange:      max: 5    interval: 1m

3 最初的验证
最初，咱们一次执行残缺的试验脚本。脚本advanced_canary.sh示意如下：

!/usr/bin/env sh

SCRIPT_PATH="$(

cd "$(dirname "$0")" >/dev/null 2>&1pwd -P

)/"
cd "$SCRIPT_PATH" || exit

source config
alias k="kubectl --kubeconfig $USER_CONFIG"
alias m="kubectl --kubeconfig $MESH_CONFIG"
alias h="helm --kubeconfig $USER_CONFIG"

echo "#### I Bootstrap ####"
echo "1 Create a test namespace with Istio sidecar injection enabled:"
k delete ns test
m delete ns test
k create ns test
m create ns test
m label namespace test istio-injection=enabled

echo "2 Create a deployment and a horizontal pod autoscaler:"
k apply -f $FLAAGER_SRC/kustomize/podinfo/deployment.yaml -n test
k apply -f resources_hpa/requests_total_hpa.yaml
k get hpa -n test

echo "3 Deploy the load testing service to generate traffic during the canary analysis:"
k apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"

k get pod,svc -n test
echo "......"
sleep 40s

echo "4 Create a canary custom resource:"
k apply -f resources_canary2/metrics-404.yaml
k apply -f resources_canary2/podinfo-canary.yaml

k get pod,svc -n test
echo "......"
sleep 120s

echo "#### III Automated canary promotion ####"

echo "1 Trigger a canary deployment by updating the container image:"
k -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

echo "2 Flagger detects that the deployment revision changed and starts a new rollout:"

while true; do k -n test describe canary/podinfo; sleep 10s;done
应用如下命令执行残缺的试验脚本：

sh progressive_delivery/advanced_canary.sh
试验后果示意如下：

I Bootstrap

1 Create a test namespace with Istio sidecar injection enabled:
namespace "test" deleted
namespace "test" deleted
namespace/test created
namespace/test created
namespace/test labeled
2 Create a deployment and a horizontal pod autoscaler:
deployment.apps/podinfo created
horizontalpodautoscaler.autoscaling/podinfo-total created
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
podinfo-total Deployment/podinfo <unknown>/10 (avg) 1 5 0 0s
3 Deploy the load testing service to generate traffic during the canary analysis:
service/flagger-loadtester created
deployment.apps/flagger-loadtester created
NAME READY STATUS RESTARTS AGE
pod/flagger-loadtester-76798b5f4c-ftlbn 0/2 Init:0/1 0 1s
pod/podinfo-689f645b78-65n9d 1/1 Running 0 28s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/flagger-loadtester ClusterIP 172.21.15.223 <none> 80/TCP 1s
......
4 Create a canary custom resource:
metrictemplate.flagger.app/not-found-percentage created
canary.flagger.app/podinfo created
NAME READY STATUS RESTARTS AGE
pod/flagger-loadtester-76798b5f4c-ftlbn 2/2 Running 0 41s
pod/podinfo-689f645b78-65n9d 1/1 Running 0 68s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/flagger-loadtester ClusterIP 172.21.15.223 <none> 80/TCP 41s
......

III Automated canary promotion

1 Trigger a canary deployment by updating the container image:
deployment.apps/podinfo image updated
2 Flagger detects that the deployment revision changed and starts a new rollout:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Synced 10m flagger podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
Normal Synced 9m23s (x2 over 10m) flagger all the metrics providers are available!
Normal Synced 9m23s flagger Initialization done! podinfo.test
Normal Synced 8m23s flagger New revision detected! Scaling up podinfo.test
Normal Synced 7m23s flagger Starting canary analysis for podinfo.test
Normal Synced 7m23s flagger Pre-rollout check acceptance-test passed
Normal Synced 7m23s flagger Advance podinfo.test canary weight 10
Normal Synced 6m23s flagger Advance podinfo.test canary weight 20
Normal Synced 5m23s flagger Advance podinfo.test canary weight 30
Normal Synced 4m23s flagger Advance podinfo.test canary weight 40
Normal Synced 23s (x4 over 3m23s) flagger (combined from similar events): Promotion completed! Scaling down podinfo.test
原文链接
本文为阿里云原创内容，未经容许不得转载。