乐趣区

关于腾讯云:一文读懂-SuperEdge-分布式健康检查云端

杜杨浩,腾讯云高级工程师,热衷于开源、容器和 Kubernetes。目前次要从事镜像仓库、Kubernetes 集群高可用 & 备份还原,以及边缘计算相干研发工作。

前言

SuperEdge 分布式健康检查性能由边端的 edge-health-daemon 以及云端的 edge-health-admission 组成:

  • edge-health-daemon:对同区域边缘节点执行分布式健康检查,并向 apiserver 发送衰弱状态投票后果(给 node 打 annotation)
  • edge-health-admission:一直依据 node edge-health annotation 调整 kube-controller-manager 设置的 node taint(去掉 NoExecute taint)以及 endpoints(将失联节点上的 pods 从 endpoint subsets notReadyAddresses 移到 addresses 中),从而实现云端和边端独特决定节点状态

整体架构如下所示:

之所以创立 edge-health-admission 云端组件,是因为当云边断连时,kube-controller-manager 会执行如下操作:

  • 失联的节点被置为 ConditionUnknown 状态,并被增加 NoSchedule 和 NoExecute 的 taints
  • 失联的节点上的 pod 从 Service 的 Endpoint 列表中移除

当 edge-health-daemon 在边端依据健康检查判断节点状态失常时,会更新 node:去掉 NoExecute taint。然而在 node 胜利更新之后又会被 kube-controller-manager 给刷回去(再次增加 NoExecute taint),因而必须增加 Kubernetes mutating admission webhook 也即 edge-health-admission,将 kube-controller-manager 对 node api resource 的更改做调整,最终实现分布式健康检查成果

在深刻源码之前先介绍一下 Kubernetes Admission Controllers

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. The controllers consist of the list below, are compiled into the kube-apiserver binary, and may only be configured by the cluster administrator. In that list, there are two special controllers: MutatingAdmissionWebhook and ValidatingAdmissionWebhook. These execute the mutating and validating (respectively) admission control webhooks which are configured in the API.

Kubernetes Admission Controllers 是 kube-apiserver 解决 api 申请的某个环节,用于在 api 申请认证 & 鉴权之后,对象长久化之前进行调用,对申请进行校验或者批改(or both)

Kubernetes Admission Controllers 包含多种 admission,大多数都内嵌在 kube-apiserver 代码中了。其中 MutatingAdmissionWebhook 以及 ValidatingAdmissionWebhook controller 比拟非凡,它们别离会调用内部结构的 mutating admission control webhooks 以及 validating admission control webhooks

Admission webhooks are HTTP callbacks that receive admission requests and do something with them. You can define two types of admission webhooks, validating admission webhook and mutating admission webhook. Mutating admission webhooks are invoked first, and can modify objects sent to the API server to enforce custom defaults. After all object modifications are complete, and after the incoming object is validated by the API server, validating admission webhooks are invoked and can reject requests to enforce custom policies.

Admission Webhooks 是一个 HTTP 回调服务,承受 AdmissionReview 申请并进行解决,依照解决形式的不同,能够将 Admission Webhooks 分类如下:

  • validating admission webhook:通过 ValidatingWebhookConfiguration 配置,会对 api 申请进行准入校验,然而不能批改申请对象
  • mutating admission webhook:通过 MutatingWebhookConfiguration 配置,会对 api 申请进行准入校验以及批改申请对象

两种类型的 webhooks 都须要定义如下 Matching requests 字段:

  • admissionReviewVersions:定义了 apiserver 所反对的 AdmissionReview api resoure 的版本列表(API servers send the first AdmissionReview version in the admissionReviewVersions list they support)
  • name:webhook 名称(如果一个 WebhookConfiguration 中定义了多个 webhooks,须要保障名称的唯一性)
  • clientConfig:定义了 webhook server 的拜访地址 (url or service) 以及 CA bundle(optionally include a custom CA bundle to use to verify the TLS connection)
  • namespaceSelector:限定了匹配申请资源的命名空间 labelSelector
  • objectSelector:限定了匹配申请资源自身的 labelSelector
  • rules:限定了匹配申请的 operations,apiGroups,apiVersions,resources 以及 resource scope,如下:

    • operations:规定了申请操作列表(Can be “CREATE”, “UPDATE”, “DELETE”, “CONNECT”, or “*” to match all.)
    • apiGroups:规定了申请资源的 API groups 列表(“” is the core API group. “*” matches all API groups.)
    • apiVersions:规定了申请资源的 API versions 列表(“*” matches all API versions.)
    • resources:规定了申请资源类型(node, deployment and etc)
    • scope:规定了申请资源的范畴(Cluster,Namespaced or *)
  • timeoutSeconds:规定了 webhook 回应的超时工夫,如果超时了,依据 failurePolicy 进行解决
  • failurePolicy:规定了 apiserver 对 admission webhook 申请失败的解决策略:

    • Ignore:means that an error calling the webhook is ignored and the API request is allowed to continue.
    • Fail:means that an error calling the webhook causes the admission to fail and the API request to be rejected.
  • matchPolicy:规定了 rules 如何匹配到来的 api 申请,如下:

    • Exact:齐全匹配 rules 列表限度
    • Equivalent:如果批改申请资源 (apiserver 能够实现对象在不同版本的转化) 能够转化为可能配置 rules 列表限度,则认为该申请匹配,能够发送给 admission webhook
  • reinvocationPolicy:In v1.15+, to allow mutating admission plugins to observe changes made by other plugins, built-in mutating admission plugins are re-run if a mutating webhook modifies an object, and mutating webhooks can specify a reinvocationPolicy to control whether they are reinvoked as well.

    • Never: the webhook must not be called more than once in a single admission evaluation
    • IfNeeded: the webhook may be called again as part of the admission evaluation if the object being admitted is modified by other admission plugins after the initial webhook call.
  • Side effects:某些 webhooks 除了批改 AdmissionReview 的内容外,还会连带批改其它的资源(“side effects”)。而 sideEffects 批示了 Webhooks 是否具备 ”side effects”,取值如下:

    • None: calling the webhook will have no side effects.
    • NoneOnDryRun: calling the webhook will possibly have side effects, but if a request with dryRun: true is sent to the webhook, the webhook will suppress the side effects (the webhook is dryRun-aware).

这里给出 edge-health-admission 对应的 MutatingWebhookConfiguration 作为参考示例:

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  name: edge-health-admission
webhooks:
  - admissionReviewVersions:
      - v1
    clientConfig:
      caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNwRENDQVl3Q0NRQ2RaL0w2akZSSkdqQU5CZ2txaGtpRzl3MEJBUXNGQURBVU1SSXdFQVlEVlFRRERBbFgKYVhObE1tTWdRMEV3SGhjTk1qQXdOekU0TURRek9ERTNXaGNOTkRjeE1qQTBNRFF6T0RFM1dqQVVNUkl3RUFZRApWUVFEREFsWGFYTmxNbU1nUTBFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUNSCnhHT2hrODlvVkRHZklyVDBrYVkwajdJQVJGZ2NlVVFmVldSZVhVcjh5eEVOQkF6ZnJNVVZyOWlCNmEwR2VFL3cKZzdVdW8vQWtwUEgrbzNQNjFxdWYrTkg1UDBEWHBUd1pmWU56VWtyaUVja3FOSkYzL2liV0o1WGpFZUZSZWpidgpST1V1VEZabmNWOVRaeTJISVF2UzhTRzRBTWJHVmptQXlDMStLODBKdDI3QUl4YmdndmVVTW8xWFNHYnRxOXlJCmM3Zk1QTXJMSHhaOUl5aTZla3BwMnJrNVdpeU5YbXZhSVA4SmZMaEdnTU56YlJaS1RtL0ZKdDdyV0dhQ1orNXgKV0kxRGJYQ2MyWWhmbThqU1BqZ3NNQTlaNURONDU5ellJSkVhSTFHeFI3MlhaUVFMTm8zdE5jd3IzVlQxVlpiTgo1cmhHQlVaTFlrMERtd25vWTBCekFnTUJBQUV3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQUhuUDJibnJBcWlWCjYzWkpMVzM0UWFDMnRreVFScTNVSUtWR3RVZHFobWRVQ0I1SXRoSUlleUdVRVdqVExpc3BDQzVZRHh4YVdrQjUKTUxTYTlUY0s3SkNOdkdJQUdQSDlILzRaeXRIRW10aFhiR1hJQ3FEVUVmSUVwVy9ObUgvcnBPQUxhYlRvSUVzeQpVNWZPUy9PVVZUM3ZoSldlRjdPblpIOWpnYk1SZG9zVElhaHdQdTEzZEtZMi8zcEtxRW1Cd1JkbXBvTExGbW9MCmVTUFQ4SjREZExGRkh2QWJKalFVbjhKQTZjOHUrMzZJZDIrWE1sTGRZYTdnTnhvZTExQTl6eFJQczRXdlpiMnQKUXZpbHZTbkFWb0ZUSVozSlpjRXVWQXllNFNRY1dKc3FLMlM0UER1VkNFdlg0SmRCRlA2NFhvU08zM3pXaWhtLworMXg3OXZHMUpFcz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
      service:
        namespace: kube-system
        name: edge-health-admission
        path: /node-taint
    failurePolicy: Ignore
    matchPolicy: Exact
    name: node-taint.k8s.io
    namespaceSelector: {}
    objectSelector: {}
    reinvocationPolicy: Never
    rules:
      - apiGroups:
          - '*'
        apiVersions:
          - '*'
        operations:
          - UPDATE
        resources:
          - nodes
        scope: '*'
    sideEffects: None
    timeoutSeconds: 5
  - admissionReviewVersions:
      - v1
    clientConfig:
      caBundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNwRENDQVl3Q0NRQ2RaL0w2akZSSkdqQU5CZ2txaGtpRzl3MEJBUXNGQURBVU1SSXdFQVlEVlFRRERBbFgKYVhObE1tTWdRMEV3SGhjTk1qQXdOekU0TURRek9ERTNXaGNOTkRjeE1qQTBNRFF6T0RFM1dqQVVNUkl3RUFZRApWUVFEREFsWGFYTmxNbU1nUTBFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUNSCnhHT2hrODlvVkRHZklyVDBrYVkwajdJQVJGZ2NlVVFmVldSZVhVcjh5eEVOQkF6ZnJNVVZyOWlCNmEwR2VFL3cKZzdVdW8vQWtwUEgrbzNQNjFxdWYrTkg1UDBEWHBUd1pmWU56VWtyaUVja3FOSkYzL2liV0o1WGpFZUZSZWpidgpST1V1VEZabmNWOVRaeTJISVF2UzhTRzRBTWJHVmptQXlDMStLODBKdDI3QUl4YmdndmVVTW8xWFNHYnRxOXlJCmM3Zk1QTXJMSHhaOUl5aTZla3BwMnJrNVdpeU5YbXZhSVA4SmZMaEdnTU56YlJaS1RtL0ZKdDdyV0dhQ1orNXgKV0kxRGJYQ2MyWWhmbThqU1BqZ3NNQTlaNURONDU5ellJSkVhSTFHeFI3MlhaUVFMTm8zdE5jd3IzVlQxVlpiTgo1cmhHQlVaTFlrMERtd25vWTBCekFnTUJBQUV3RFFZSktvWklodmNOQVFFTEJRQURnZ0VCQUhuUDJibnJBcWlWCjYzWkpMVzM0UWFDMnRreVFScTNVSUtWR3RVZHFobWRVQ0I1SXRoSUlleUdVRVdqVExpc3BDQzVZRHh4YVdrQjUKTUxTYTlUY0s3SkNOdkdJQUdQSDlILzRaeXRIRW10aFhiR1hJQ3FEVUVmSUVwVy9ObUgvcnBPQUxhYlRvSUVzeQpVNWZPUy9PVVZUM3ZoSldlRjdPblpIOWpnYk1SZG9zVElhaHdQdTEzZEtZMi8zcEtxRW1Cd1JkbXBvTExGbW9MCmVTUFQ4SjREZExGRkh2QWJKalFVbjhKQTZjOHUrMzZJZDIrWE1sTGRZYTdnTnhvZTExQTl6eFJQczRXdlpiMnQKUXZpbHZTbkFWb0ZUSVozSlpjRXVWQXllNFNRY1dKc3FLMlM0UER1VkNFdlg0SmRCRlA2NFhvU08zM3pXaWhtLworMXg3OXZHMUpFcz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
      service:
        namespace: kube-system
        name: edge-health-admission
        path: /endpoint
    failurePolicy: Ignore
    matchPolicy: Exact
    name: endpoint.k8s.io
    namespaceSelector: {}
    objectSelector: {}
    reinvocationPolicy: Never
    rules:
      - apiGroups:
          - '*'
        apiVersions:
          - '*'
        operations:
          - UPDATE
        resources:
          - endpoints
        scope: '*'
    sideEffects: None
    timeoutSeconds: 5

kube-apiserver 会发送 AdmissionReview(apiGroup: admission.k8s.io,apiVersion:v1 or v1beta1)给 Webhooks,并封装成 JSON 格局,示例如下:

# This example shows the data contained in an AdmissionReview object for a request to update the scale subresource of an apps/v1 Deployment
{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "request": {
    # Random uid uniquely identifying this admission call
    "uid": "705ab4f5-6393-11e8-b7cc-42010a800002",
    # Fully-qualified group/version/kind of the incoming object
    "kind": {"group":"autoscaling","version":"v1","kind":"Scale"},
    # Fully-qualified group/version/kind of the resource being modified
    "resource": {"group":"apps","version":"v1","resource":"deployments"},
    # subresource, if the request is to a subresource
    "subResource": "scale",
    # Fully-qualified group/version/kind of the incoming object in the original request to the API server.
    # This only differs from `kind` if the webhook specified `matchPolicy: Equivalent` and the
    # original request to the API server was converted to a version the webhook registered for.
    "requestKind": {"group":"autoscaling","version":"v1","kind":"Scale"},
    # Fully-qualified group/version/kind of the resource being modified in the original request to the API server.
    # This only differs from `resource` if the webhook specified `matchPolicy: Equivalent` and the
    # original request to the API server was converted to a version the webhook registered for.
    "requestResource": {"group":"apps","version":"v1","resource":"deployments"},
    # subresource, if the request is to a subresource
    # This only differs from `subResource` if the webhook specified `matchPolicy: Equivalent` and the
    # original request to the API server was converted to a version the webhook registered for.
    "requestSubResource": "scale",
    # Name of the resource being modified
    "name": "my-deployment",
    # Namespace of the resource being modified, if the resource is namespaced (or is a Namespace object)
    "namespace": "my-namespace",
    # operation can be CREATE, UPDATE, DELETE, or CONNECT
    "operation": "UPDATE",
    "userInfo": {
      # Username of the authenticated user making the request to the API server
      "username": "admin",
      # UID of the authenticated user making the request to the API server
      "uid": "014fbff9a07c",
      # Group memberships of the authenticated user making the request to the API server
      "groups": ["system:authenticated","my-admin-group"],
      # Arbitrary extra info associated with the user making the request to the API server.
      # This is populated by the API server authentication layer and should be included
      # if any SubjectAccessReview checks are performed by the webhook.
      "extra": {"some-key":["some-value1", "some-value2"]
      }
    },
    # object is the new object being admitted.
    # It is null for DELETE operations.
    "object": {"apiVersion":"autoscaling/v1","kind":"Scale",...},
    # oldObject is the existing object.
    # It is null for CREATE and CONNECT operations.
    "oldObject": {"apiVersion":"autoscaling/v1","kind":"Scale",...},
    # options contains the options for the operation being admitted, like meta.k8s.io/v1 CreateOptions, UpdateOptions, or DeleteOptions.
    # It is null for CONNECT operations.
    "options": {"apiVersion":"meta.k8s.io/v1","kind":"UpdateOptions",...},
    # dryRun indicates the API request is running in dry run mode and will not be persisted.
    # Webhooks with side effects should avoid actuating those side effects when dryRun is true.
    # See http://k8s.io/docs/reference/using-api/api-concepts/#make-a-dry-run-request for more details.
    "dryRun": false
  }
}

而 Webhooks 须要向 kube-apiserver 回应具备雷同版本的 AdmissionReview,并封装成 JSON 格局,蕴含如下关键字段:

  • uid:拷贝发送给 webhooks 的 AdmissionReview request.uid 字段
  • allowed:true 示意准许;false 示意不准许
  • status:当不准许申请时,能够通过 status 给出相干起因(http code and message)
  • patch:base64 编码,蕴含 mutating admission webhook 对申请对象的一系列 JSON patch 操作
  • patchType:目前只反对 JSONPatch 类型

示例如下:

# a webhook response to add that label would be:{
  "apiVersion": "admission.k8s.io/v1",
  "kind": "AdmissionReview",
  "response": {
    "uid": "<value from request.uid>",
    "allowed": true,
    "patchType": "JSONPatch",
    "patch": "W3sib3AiOiAiYWRkIiwgInBhdGgiOiAiL3NwZWMvcmVwbGljYXMiLCAidmFsdWUiOiAzfV0="
  }
}

edge-health-admission 实际上就是一个 mutating admission webhook,选择性地对 endpoints 以及 node UPDATE 申请进行批改,上面将详细分析其原理

edge-health-admission 源码剖析

edge-health-admission 齐全参考官网示例编写,如下是监听入口:

func (eha *EdgeHealthAdmission) Run(stopCh <-chan struct{}) {if !cache.WaitForNamedCacheSync("edge-health-admission", stopCh, eha.cfg.NodeInformer.Informer().HasSynced) {return}
    http.HandleFunc("/node-taint", eha.serveNodeTaint)
    http.HandleFunc("/endpoint", eha.serveEndpoint)
    server := &http.Server{Addr: eha.cfg.Addr,}
    go func() {if err := server.ListenAndServeTLS(eha.cfg.CertFile, eha.cfg.KeyFile); err != http.ErrServerClosed {klog.Fatalf("ListenAndServeTLS err %+v", err)
        }
    }()
    for {
        select {
        case <-stopCh:
            ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
            defer cancel()
            if err := server.Shutdown(ctx); err != nil {klog.Errorf("Server: program exit, server exit error %+v", err)
            }
            return
        default:
        }
    }
}

这里会注册两种路由处理函数:

  • node-taint:对应处理函数 serveNodeTaint,负责对 node UPDATE 申请进行更改
  • endpoint:对应处理函数 serveEndpoint,负责对 endpoints UPDATE 申请进行更改

而这两个函数都会调用 serve 函数,如下:

// serve handles the http portion of a request prior to handing to an admit function
func serve(w http.ResponseWriter, r *http.Request, admit admitFunc) {var body []byte
    if r.Body != nil {if data, err := ioutil.ReadAll(r.Body); err == nil {body = data}
    }
    // verify the content type is accurate
    contentType := r.Header.Get("Content-Type")
    if contentType != "application/json" {klog.Errorf("contentType=%s, expect application/json", contentType)
        return
    }
    klog.V(4).Info(fmt.Sprintf("handling request: %s", body))
    // The AdmissionReview that was sent to the webhook
    requestedAdmissionReview := admissionv1.AdmissionReview{}
    // The AdmissionReview that will be returned
    responseAdmissionReview := admissionv1.AdmissionReview{}
    deserializer := codecs.UniversalDeserializer()
    if _, _, err := deserializer.Decode(body, nil, &requestedAdmissionReview); err != nil {klog.Error(err)
        responseAdmissionReview.Response = toAdmissionResponse(err)
    } else {
        // pass to admitFunc
        responseAdmissionReview.Response = admit(requestedAdmissionReview)
    }
    // Return the same UID
    responseAdmissionReview.Response.UID = requestedAdmissionReview.Request.UID
    klog.V(4).Info(fmt.Sprintf("sending response: %+v", responseAdmissionReview.Response))
    respBytes, err := json.Marshal(responseAdmissionReview)
    if err != nil {klog.Error(err)
    }
    if _, err := w.Write(respBytes); err != nil {klog.Error(err)
    }
}

serve 逻辑如下所示:

  • 解析 request.Body 为 AdmissionReview 对象,并赋值给 requestedAdmissionReview
  • 对 AdmissionReview 对象执行 admit 函数,并赋值给回 responseAdmissionReview
  • 设置 responseAdmissionReview.Response.UID 为申请的 AdmissionReview.Request.UID

其中 serveNodeTaint 以及 serveEndpoint 对应的 admit 函数别离为:mutateNodeTaint 以及 mutateEndpoint,上面顺次剖析:

1、mutateNodeTaint

mutateNodeTaint 会对 node UPDATE 申请依照分布式健康检查后果进行批改:

func (eha *EdgeHealthAdmission) mutateNodeTaint(ar admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {klog.V(4).Info("mutating node taint")
    nodeResource := metav1.GroupVersionResource{Group: "", Version:"v1", Resource:"nodes"}
    if ar.Request.Resource != nodeResource {klog.Errorf("expect resource to be %s", nodeResource)
        return nil
    }
    var node corev1.Node
    deserializer := codecs.UniversalDeserializer()
    if _, _, err := deserializer.Decode(ar.Request.Object.Raw, nil, &node); err != nil {klog.Error(err)
        return toAdmissionResponse(err)
    }
    reviewResponse := admissionv1.AdmissionResponse{}
    reviewResponse.Allowed = true
    if index, condition := util.GetNodeCondition(&node.Status, v1.NodeReady); index != -1 && condition.Status == v1.ConditionUnknown {
        if node.Annotations != nil {var patches []*patch
            if healthy, existed := node.Annotations[common.NodeHealthAnnotation]; existed && healthy == common.NodeHealthAnnotationPros {if index, existed := util.TaintExistsPosition(node.Spec.Taints, common.UnreachableNoExecuteTaint); existed {
                    patches = append(patches, &patch{
                        OP:   "remove",
                        Path: fmt.Sprintf("/spec/taints/%d", index),
                    })
                    klog.V(4).Infof("UnreachableNoExecuteTaint: remove %d taints %s", index, node.Spec.Taints[index])
                }
            }
            if len(patches) > 0 {patchBytes, _ := json.Marshal(patches)
                reviewResponse.Patch = patchBytes
                pt := admissionv1.PatchTypeJSONPatch
                reviewResponse.PatchType = &pt
            }
        }
    }
    return &reviewResponse
}

主体逻辑如下:

  • 查看 AdmissionReview.Request.Resource 是否为 node 资源的 group/version/kind
  • 将 AdmissionReview.Request.Object.Raw 转化为 node 对象
  • 设置 AdmissionReview.Response.Allowed 为 true,示意无论如何都准许该申请
  • 执行帮助边端健康检查外围逻辑:在节点处于 ConditionUnknown 状态且分布式健康检查后果为失常的状况下,若节点存在 NoExecute(node.kubernetes.io/unreachable) taint,则将其移除

总的来说,mutateNodeTaint 的作用就是:一直修改被 kube-controller-manager 更新的节点状态,去掉 NoExecute(node.kubernetes.io/unreachable) taint,让节点不会被驱赶

2、mutateEndpoint

mutateEndpoint 会对 endpoints UPDATE 申请依照分布式健康检查后果进行批改:

func (eha *EdgeHealthAdmission) mutateEndpoint(ar admissionv1.AdmissionReview) *admissionv1.AdmissionResponse {klog.V(4).Info("mutating endpoint")
    endpointResource := metav1.GroupVersionResource{Group: "", Version:"v1", Resource:"endpoints"}
    if ar.Request.Resource != endpointResource {klog.Errorf("expect resource to be %s", endpointResource)
        return nil
    }
    var endpoint corev1.Endpoints
    deserializer := codecs.UniversalDeserializer()
    if _, _, err := deserializer.Decode(ar.Request.Object.Raw, nil, &endpoint); err != nil {klog.Error(err)
        return toAdmissionResponse(err)
    }
    reviewResponse := admissionv1.AdmissionResponse{}
    reviewResponse.Allowed = true
    for epSubsetIndex, epSubset := range endpoint.Subsets {
        for notReadyAddrIndex, EndpointAddress := range epSubset.NotReadyAddresses {if node, err := eha.nodeLister.Get(*EndpointAddress.NodeName); err == nil {if index, condition := util.GetNodeCondition(&node.Status, v1.NodeReady); index != -1 && condition.Status == v1.ConditionUnknown {
                    if node.Annotations != nil {var patches []*patch
                        if healthy, existed := node.Annotations[common.NodeHealthAnnotation]; existed && healthy == common.NodeHealthAnnotationPros {
                            // TODO: handle readiness probes failure
                            // Remove address on node from endpoint notReadyAddresses
                            patches = append(patches, &patch{
                                OP:   "remove",
                                Path: fmt.Sprintf("/subsets/%d/notReadyAddresses/%d", epSubsetIndex, notReadyAddrIndex),
                            })
                            // Add address on node to endpoint readyAddresses
                            TargetRef := map[string]interface{}{}
                            TargetRef["kind"] = EndpointAddress.TargetRef.Kind
                            TargetRef["namespace"] = EndpointAddress.TargetRef.Namespace
                            TargetRef["name"] = EndpointAddress.TargetRef.Name
                            TargetRef["uid"] = EndpointAddress.TargetRef.UID
                            TargetRef["apiVersion"] = EndpointAddress.TargetRef.APIVersion
                            TargetRef["resourceVersion"] = EndpointAddress.TargetRef.ResourceVersion
                            TargetRef["fieldPath"] = EndpointAddress.TargetRef.FieldPath
                            patches = append(patches, &patch{
                                OP:   "add",
                                Path: fmt.Sprintf("/subsets/%d/addresses/0", epSubsetIndex),
                                Value: map[string]interface{}{
                                    "ip":        EndpointAddress.IP,
                                    "hostname":  EndpointAddress.Hostname,
                                    "nodeName":  EndpointAddress.NodeName,
                                    "targetRef": TargetRef,
                                },
                            })
                            if len(patches) != 0 {patchBytes, _ := json.Marshal(patches)
                                reviewResponse.Patch = patchBytes
                                pt := admissionv1.PatchTypeJSONPatch
                                reviewResponse.PatchType = &pt
                            }
                        }
                    }
                }
            } else {klog.Errorf("Get pod's node err %+v", err)
            }
        }
    }
    return &reviewResponse
}

主体逻辑如下:

  • 查看 AdmissionReview.Request.Resource 是否为 endpoints 资源的 group/version/kind
  • 将 AdmissionReview.Request.Object.Raw 转化为 endpoints 对象
  • 设置 AdmissionReview.Response.Allowed 为 true,示意无论如何都准许该申请
  • 遍历 endpoints.Subset.NotReadyAddresses,如果 EndpointAddress 所在节点处于 ConditionUnknown 状态且分布式健康检查后果为失常,则将该 EndpointAddress 从 endpoints.Subset.NotReadyAddresses 移到 endpoints.Subset.Addresses

总的来说,mutateEndpoint 的作用就是:一直修改被 kube-controller-manager 更新的 endpoints 状态,将分布式健康检查失常节点上的负载从 endpoints.Subset.NotReadyAddresses 移到 endpoints.Subset.Addresses 中,让服务仍旧可用

总结

  • SuperEdge 分布式健康检查性能由边端的 edge-health-daemon 以及云端的 edge-health-admission 组成:

    • edge-health-daemon:对同区域边缘节点执行分布式健康检查,并向 apiserver 发送衰弱状态投票后果(给 node 打 annotation)
    • edge-health-admission:一直依据 node edge-health annotation 调整 kube-controller-manager 设置的 node taint(去掉 NoExecute taint)以及 endpoints(将失联节点上的 pods 从 endpoint subsets notReadyAddresses 移到 addresses 中),从而实现云端和边端独特决定节点状态
  • 之所以创立 edge-health-admission 云端组件,是因为当云边断连时,kube-controller-manager 会将失联的节点置为 ConditionUnknown 状态,并增加 NoSchedule 和 NoExecute 的 taints;同时失联的节点上的 pod 从 Service 的 Endpoint 列表中移除。当 edge-health-daemon 在边端依据健康检查判断节点状态失常时,会更新 node:去掉 NoExecute taint。然而在 node 胜利更新之后又会被 kube-controller-manager 给刷回去(再次增加 NoExecute taint),因而必须增加 Kubernetes mutating admission webhook 也即 edge-health-admission,将 kube-controller-manager 对 node api resource 的更改做调整,最终实现分布式健康检查成果
  • Kubernetes Admission Controllers 是 kube-apiserver 解决 api 申请的某个环节,用于在 api 申请认证 & 鉴权之后,对象长久化之前进行调用,对申请进行校验或者批改(or both);包含多种 admission,大多数都内嵌在 kube-apiserver 代码中了。其中 MutatingAdmissionWebhook 以及 ValidatingAdmissionWebhook controller 比拟非凡,它们别离会调用内部结构的 mutating admission control webhooks 以及 validating admission control webhooks
  • Admission Webhooks 是一个 HTTP 回调服务,承受 AdmissionReview 申请并进行解决,依照解决形式的不同,能够将 Admission Webhooks 分类如下:

    • validating admission webhook:通过 ValidatingWebhookConfiguration 配置,会对 api 申请进行准入校验,然而不能批改申请对象
    • mutating admission webhook:通过 MutatingWebhookConfiguration 配置,会对 api 申请进行准入校验以及批改申请对象
  • kube-apiserver 会发送 AdmissionReview(apiGroup: admission.k8s.io,apiVersion:v1 or v1beta1)给 Webhooks,并封装成 JSON 格局;而 Webhooks 须要向 kube-apiserver 回应具备雷同版本的 AdmissionReview,并封装成 JSON 格局,蕴含如下关键字段:

    • uid:拷贝发送给 webhooks 的 AdmissionReview request.uid 字段
    • allowed:true 示意准许;false 示意不准许
    • status:当不准许申请时,能够通过 status 给出相干起因(http code and message)
    • patch:base64 编码,蕴含 mutating admission webhook 对申请对象的一系列 JSON patch 操作
    • patchType:目前只反对 JSONPatch 类型
  • edge-health-admission 实际上就是一个 mutating admission webhook,选择性地对 endpoints 以及 node UPDATE 申请进行批改,蕴含如下解决逻辑:

    • mutateNodeTaint:一直修改被 kube-controller-manager 更新的节点状态,去掉 NoExecute(node.kubernetes.io/unreachable) taint,让节点不会被驱赶
    • mutateEndpoint:一直修改被 kube-controller-manager 更新的 endpoints 状态,将分布式健康检查失常节点上的负载从 endpoints.Subset.NotReadyAddresses 移到 endpoints.Subset.Addresses 中,让服务仍旧可用

【腾讯云原生】云说新品、云研新术、云游新活、云赏资讯,扫码关注同名公众号,及时获取更多干货!!

退出移动版