简介: 服务网格ASM的Mixerless Telemetry技术,为业务容器提供了无侵入式的遥测数据。遥测数据一方面作为监控指标被ARMPS/prometheus采集,用于服务网格可观测性;另一方面被HPA和flaggers应用,成为利用级扩缩容和渐进式灰度公布的基石。 本系列聚焦于遥测数据在利用级扩缩容和渐进式灰度公布上的实际,将分三篇介绍遥测数据(监控指标)、利用级扩缩容,和渐进式灰度公布。

服务网格ASM的Mixerless Telemetry技术,为业务容器提供了无侵入式的遥测数据。遥测数据一方面作为监控指标被ARMPS/prometheus采集,用于服务网格可观测性;另一方面被HPA和flaggers应用,成为利用级扩缩容和渐进式灰度公布的基石。

本系列聚焦于遥测数据在利用级扩缩容和渐进式灰度公布上的实际,将分三篇介绍遥测数据(监控指标)、利用级扩缩容,和渐进式灰度公布。

总体架构
本系列的总体架构如下图所示:

ASM下发Mixerless Telemetry相干的EnvoyFilter配置到各ASM sidecar(envoy),启用利用级监控指标的采集。
业务流量通过Ingress Gateway进入,各ASM sidecar开始采集相干监控指标。
Prometheus从各POD上采集监控指标。
HPA通过Adapter从Prometheus查问相干POD的监控指标,并依据配置进行扩缩容。
Flagger通过Prometheus查问相干POD的监控指标,并依据配置向ASM发动VirtualService配置更新。
ASM下发VirtualService配置到各ASM sidecar,从而实现渐进式灰度公布。

Flagger渐进式公布流程
Flagger官网形容了渐进式公布流程,这里翻译如下:

探测并更新灰度Deployment到新版本
灰度POD实例数从0开始扩容
期待灰度POD实例数达到HPA定义的最小正本数量
灰度POD实例衰弱检测
由flagger-loadtester实例发动acceptance-test验证
灰度公布在验证失败时终止
由flagger-loadtester实例发动load-test验证
在配置流量复制时开始从生产全流量复制到灰度
每分钟从Prometheus查问并检测申请成功率和申请提早等监控指标
灰度公布在监控指标不符预期的数量达到阈值时终止
达到配置中迭代的次数后进行流量复制
开始切流到灰度POD实例
更新生产Deployment到新版本
期待生产Deployment滚动降级结束
期待生产POD实例数达到HPA定义的最小正本数量
生产POD实例衰弱检测
切流回生产POD实例
灰度POD实例缩容至0
发送灰度公布剖析后果告诉
原文如下:

With the above configuration, Flagger will run a canary release with the following steps:

detect new revision (deployment spec, secrets or configmaps changes)
scale from zero the canary deployment
wait for the HPA to set the canary minimum replicas
check canary pods health
run the acceptance tests
abort the canary release if tests fail
start the load tests
mirror 100% of the traffic from primary to canary
check request success rate and request duration every minute
abort the canary release if the metrics check failure threshold is reached
stop traffic mirroring after the number of iterations is reached
route live traffic to the canary pods
promote the canary (update the primary secrets, configmaps and deployment spec)
wait for the primary deployment rollout to finish
wait for the HPA to set the primary minimum replicas
check primary pods health
switch live traffic back to primary
scale to zero the canary
send notification with the canary analysis result
前提条件
已创立ACK集群,详情请参见创立Kubernetes托管版集群。
已创立ASM实例,详情请参见创立ASM实例。
Setup Mixerless Telemetry
本篇将介绍如何基于ASM配置并采集利用级监控指标(比方申请数量总数istio_requests_total和申请提早istio_request_duration等)。次要步骤包含创立EnvoyFilter、校验envoy遥测数据和校验Prometheus采集遥测数据。

1 EnvoyFilter
登录ASM控制台,左侧导航栏抉择服务网格 >网格治理,并进入ASM实例的性能配置页面。

勾选开启采集Prometheus 监控指标
点选启用自建 Prometheus,并填入Prometheus服务地址:`prometheus:9090(本系列将应用社区版Prometheus,后文将应用这个配置)。如果应用阿里云产品ARMS,请参考集成ARMS Prometheus实现网格监控。
勾选启用 Kiali(可选)

点击确定后,咱们将在管制立体看到ASM生成的相干EnvoyFilter列表:

2 Prometheus
2.1 Install
执行如下命令装置Prometheus(残缺脚本参见:demo_mixerless.sh)。

kubectl --kubeconfig "$USER_CONFIG" apply -f $ISTIO_SRC/samples/addons/prometheus.yaml
2.2 Config Scrape
装置完Prometheus,咱们须要为其配置增加istio相干的监控指标。登录ACK控制台,左侧导航栏抉择配置管理>配置项,在istio-system下找到prometheus一行,点击编辑。

在prometheus.yaml配置中,将scrape_configs.yaml中的配置追加到scrape_configs中。

保留配置后,左侧导航栏抉择工作负载>容器组,在istio-system下找到prometheus一行,删除Prometheus POD,以确保配置在新的POD中失效。

能够执行如下命令查看Prometheus配置中的job_name:

kubectl --kubeconfig "$USER_CONFIG" get cm prometheus -n istio-system -o jsonpath={.data.prometheus\.yml} | grep job_name

  • job_name: 'istio-mesh'
  • job_name: 'envoy-stats'
  • job_name: 'istio-policy'
  • job_name: 'istio-telemetry'
  • job_name: 'pilot'
  • job_name: 'sidecar-injector'
  • job_name: prometheus
    job_name: kubernetes-apiservers
    job_name: kubernetes-nodes
    job_name: kubernetes-nodes-cadvisor
  • job_name: kubernetes-service-endpoints
  • job_name: kubernetes-service-endpoints-slow
    job_name: prometheus-pushgateway
  • job_name: kubernetes-services
  • job_name: kubernetes-pods
  • job_name: kubernetes-pods-slow
    Mixerless验证
  • podinfo
    1.1 部署
    应用如下命令部署本系列的示例利用podinfo:

kubectl --kubeconfig "$USER_CONFIG" apply -f $PODINFO_SRC/kustomize/deployment.yaml -n test
kubectl --kubeconfig "$USER_CONFIG" apply -f $PODINFO_SRC/kustomize/service.yaml -n test
1.2 生成负载
应用如下命令申请podinfo,以产生监控指标数据

podinfo_pod=$(k get po -n test -l app=podinfo -o jsonpath={.items..metadata.name})
for i in {1..10}; do
kubectl --kubeconfig "$USER_CONFIG" exec $podinfo_pod -c podinfod -n test -- curl -s podinfo:9898/version
echo
done
2 确认生成(Envoy)
本系列重点关注的监控指标项是istio_requests_total和istio_request_duration。首先,咱们在envoy容器内确认这些指标曾经生成。

2.1 istio_requests_total
应用如下命令申请envoy获取stats相干指标数据,并确认蕴含istio_requests_total。

kubectl --kubeconfig "$USER_CONFIG" exec $podinfo_pod -n test -c istio-proxy -- curl -s localhost:15090/stats/prometheus | grep istio_requests_total
返回后果信息如下:

:::: istio_requests_total ::::

TYPE istio_requests_total counter

istio_requests_total{response_code="200",reporter="destination",source_workload="podinfo",source_workload_namespace="test",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_version="unknown",source_cluster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_namespace="test",destination_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_version="unknown",destination_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_service_namespace="test",destination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="podinfo",destination_canonical_service="podinfo",source_canonical_revision="latest",destination_canonical_revision="latest"} 10

istio_requests_total{response_code="200",reporter="source",source_workload="podinfo",source_workload_namespace="test",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_version="unknown",source_cluster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_namespace="test",destination_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_version="unknown",destination_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_service_namespace="test",destination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="unknown",source_canonical_service="podinfo",destination_canonical_service="podinfo",source_canonical_revision="latest",destination_canonical_revision="latest"} 10
2.2 istio_request_duration
应用如下命令申请envoy获取stats相干指标数据,并确认蕴含istio_request_duration。

kubectl --kubeconfig "$USER_CONFIG" exec $podinfo_pod -n test -c istio-proxy -- curl -s localhost:15090/stats/prometheus | grep istio_request_duration
返回后果信息如下:

:::: istio_request_duration ::::

TYPE istio_request_duration_milliseconds histogram

istio_request_duration_milliseconds_bucket{response_code="200",reporter="destination",source_workload="podinfo",source_workload_namespace="test",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_version="unknown",source_cluster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_namespace="test",destination_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_version="unknown",destination_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_service_namespace="test",destination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="podinfo",destination_canonical_service="podinfo",source_canonical_revision="latest",destination_canonical_revision="latest",le="0.5"} 10

istio_request_duration_milliseconds_bucket{response_code="200",reporter="destination",source_workload="podinfo",source_workload_namespace="test",source_principal="spiffe://cluster.local/ns/test/sa/default",source_app="podinfo",source_version="unknown",source_cluster="c199d81d4e3104a5d90254b2a210914c8",destination_workload="podinfo",destination_workload_namespace="test",destination_principal="spiffe://cluster.local/ns/test/sa/default",destination_app="podinfo",destination_version="unknown",destination_service="podinfo.test.svc.cluster.local",destination_service_name="podinfo",destination_service_namespace="test",destination_cluster="c199d81d4e3104a5d90254b2a210914c8",request_protocol="http",response_flags="-",grpc_response_status="",connection_security_policy="mutual_tls",source_canonical_service="podinfo",destination_canonical_service="podinfo",source_canonical_revision="latest",destination_canonical_revision="latest",le="1"} 10
...
3 确认采集(Prometheus)
最初,咱们验证Envoy生成的监控指标数据,是否被Prometheus实时采集上来。对外裸露Prometheus服务,并应用浏览器申请该服务。而后在查问框输出istio_requests_total,失去后果如下图所示。


本文为阿里云原创内容,未经容许不得转载。