关于缓存:数据弹性的隐形的翅膀

简介：弹性伸缩作为Kubernetes的外围能力之一，但它始终是围绕这无状态的利用负载开展。而Fluid提供了分布式缓存的弹性伸缩能力，能够灵便裁减和膨胀数据缓存。它基于Runtime提供了缓存空间、现有缓存比例等性能指标, 联合本身对于Runtime资源的扩缩容能力，提供数据缓存按需伸缩能力。

作者| 车漾 Fluid社区Commiter

作者| 谢远东 Fluid社区Commiter

背景

随着越来越多的大数据和AI等数据密集利用开始部署和运行在Kubernetes环境下，数据密集型利用计算框架的设计理念和云原生灵便的利用编排的一致，导致了数据拜访和计算瓶颈。云原生数据编排引擎Fluid通过数据集的形象，利用分布式缓存技术，联合调度器，为利用提供了数据拜访减速的能力。

弹性伸缩作为Kubernetes的外围能力之一，但它始终是围绕这无状态的利用负载开展。而Fluid提供了分布式缓存的弹性伸缩能力，能够灵便裁减和膨胀数据缓存。它基于Runtime提供了缓存空间、现有缓存比例等性能指标, 联合本身对于Runtime资源的扩缩容能力，提供数据缓存按需伸缩能力。

这个能力对于互联网场景下大数据利用十分重要，因为少数的大数据利用都是通过端到端流水线来实现的。而这个流水线蕴含以下几个步骤：

数据提取，利用Spark，MapReduce等大数据技术对于原始数据进行预处理
模型训练，利用第一阶段生成特色数据进行机器学习模型训练，并且生成相应的模型
模型评估，通过测试集或者验证集对于第二阶段生成模型进行评估和测试
模型推理，第三阶段验证后的模型最终推送到线上为业务提供推理服务

能够看到端到端的流水线会蕴含多种不同类型的计算工作，针对每一个计算工作，实际中会有适合的业余零碎来解决（TensorFlow，PyTorch，Spark， Presto）；然而这些零碎彼此独立，通常要借助内部文件系统来实现把数据从一个阶段传递到下一个阶段。然而频繁的应用文件系统实现数据交换，会带来大量的 I/O 开销，常常会成为整个工作流的瓶颈。

而Fluid对于这个场景非常适合，用户能够创立一个Dataset对象，这个对象有能力将数据扩散缓存到Kubernetes计算节点中，作为数据交换的介质，这样防止了数据的近程写入和读取，晋升了数据应用的效率。然而这里的问题是长期数据缓存的资源预估和预留。因为在数据生产生产之前，准确的数据量预估是比拟难满足，过高的预估会导致资源预留节约，过低的预估会导致数据写入失败可能性增高。还是按需扩缩容对于使用者更加敌对。咱们心愿可能达成相似page cache的应用成果，对于最终用户来说这一层是通明的然而它带来的缓存减速成果是实实在在的。

咱们通过自定义HPA机制，通过Fluid引入了缓存弹性伸缩能力。弹性伸缩的条件是当已有缓存数据量达到肯定比例时，就会触发弹性扩容，扩容缓存空间。例如将触发条件设置为缓存空间占比超过75%，此时总的缓存空间为10G，当数据曾经占满到8G缓存空间的时候，就会触发扩容机制。

上面咱们通过一个例子帮忙您体验Fluid的主动扩缩容能力。

前提条件

举荐应用Kubernetes 1.18以上，因为在1.18之前，HPA是无奈自定义扩缩容策略的，都是通过硬编码实现的。而在1.18后，用户能够自定义扩缩容策略的，比方能够定义一次扩容后的冷却工夫。

具体步骤

1.装置jq工具不便解析json，在本例子中咱们应用操作系统是centos，能够通过yum装置jq

yum install -y jq

2.下载、装置Fluid最新版

git clone https://github.com/fluid-cloudnative/fluid.gitcd fluid/chartskubectl create ns fluid-systemhelm install fluid fluid

3.部署或配置 Prometheus

这里通过Prometheus对于AlluxioRuntime的缓存引擎裸露的 Metrics 进行收集，如果集群内无 prometheus:

$ cd fluid$ kubectl apply -f integration/prometheus/prometheus.yaml

如集群内有 prometheus,可将以下配置写到 prometheus 配置文件中:

scrape_configs:  - job_name: 'alluxio runtime'    metrics_path: /metrics/prometheus    kubernetes_sd_configs:      - role: endpoints    relabel_configs:    - source_labels: [__meta_kubernetes_service_label_monitor]      regex: alluxio_runtime_metrics      action: keep    - source_labels: [__meta_kubernetes_endpoint_port_name]      regex: web      action: keep    - source_labels: [__meta_kubernetes_namespace]      target_label: namespace      replacement: $1      action: replace    - source_labels: [__meta_kubernetes_service_label_release]      target_label: fluid_runtime      replacement: $1      action: replace    - source_labels: [__meta_kubernetes_endpoint_address_target_name]      target_label: pod      replacement: $1      action: replace

4.验证 Prometheus 装置胜利

$ kubectl get ep -n kube-system  prometheus-svcNAME             ENDPOINTS        AGEprometheus-svc   10.76.0.2:9090   6m49s$ kubectl get svc -n kube-system prometheus-svcNAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGEprometheus-svc   NodePort   172.16.135.24   <none>        9090:32114/TCP   2m7s

如果心愿可视化监控指标，您能够装置Grafana验证监控数据，具体操作能够参考文档

5.部署 metrics server

查看该集群是否包含metrics-server, 执行kubectl top node有正确输入能够显示内存和CPU，则该集群metrics server配置正确

kubectl top nodeNAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%192.168.1.204   93m          2%     1455Mi          10%192.168.1.205   125m         3%     1925Mi          13%192.168.1.206   96m          2%     1689Mi          11%

否则手动执行以下命令

kubectl create -f integration/metrics-server

6.部署 custom-metrics-api 组件

为了基于自定义指标进行扩大，你须要领有两个组件。第一个组件是从应用程序收集指标并将其存储到Prometheus工夫序列数据库。第二个组件应用收集的度量指标来扩大Kubernetes自定义metrics API，即 k8s-prometheus-adapter。第一个组件在第三步部署实现，上面部署第二个组件：

如果曾经配置了custom-metrics-api，在adapter的configmap配置中减少与dataset相干的配置

apiVersion: v1kind: ConfigMapmetadata:  name: adapter-config  namespace: monitoringdata:  config.yaml: |    rules:    - seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime!="",instance!="",job="alluxio runtime",namespace!="",pod!=""}'      seriesFilters:      - is: ^Cluster_(CapacityTotal|CapacityUsed)$      resources:        overrides:          namespace:            resource: namespace          pod:            resource: pods          fluid_runtime:            resource: datasets      name:        matches: "^(.*)"        as: "capacity_used_rate"      metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))

否则手动执行以下命令

kubectl create -f integration/custom-metrics-api/namespace.yamlkubectl create -f integration/custom-metrics-api

留神：因为custom-metrics-api对接集群中的Prometheous的拜访地址，请替换prometheous url为你真正应用的Prometheous地址。

查看自定义指标

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq{  "kind": "APIResourceList",  "apiVersion": "v1",  "groupVersion": "custom.metrics.k8s.io/v1beta1",  "resources": [    {      "name": "pods/capacity_used_rate",      "singularName": "",      "namespaced": true,      "kind": "MetricValueList",      "verbs": [        "get"      ]    },    {      "name": "datasets.data.fluid.io/capacity_used_rate",      "singularName": "",      "namespaced": true,      "kind": "MetricValueList",      "verbs": [        "get"      ]    },    {      "name": "namespaces/capacity_used_rate",      "singularName": "",      "namespaced": false,      "kind": "MetricValueList",      "verbs": [        "get"      ]    }  ]}

7.提交测试应用的Dataset

$ cat<<EOF >dataset.yamlapiVersion: data.fluid.io/v1alpha1kind: Datasetmetadata:  name: sparkspec:  mounts:    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/      name: spark---apiVersion: data.fluid.io/v1alpha1kind: AlluxioRuntimemetadata:  name: sparkspec:  replicas: 1  tieredstore:    levels:      - mediumtype: MEM        path: /dev/shm        quota: 1Gi        high: "0.99"        low: "0.7"  properties:    alluxio.user.streaming.data.timeout: 300secEOF$ kubectl create -f dataset.yamldataset.data.fluid.io/spark createdalluxioruntime.data.fluid.io/spark created

8.查看这个Dataset是否处于可用状态, 能够看到该数据集的数据总量为2.71GiB，目前Fluid提供的缓存节点数为1，能够提供的最大缓存能力为1GiB。此时数据量是无奈满足全量数据缓存的需要。

$ kubectl get datasetNAME    UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGEspark   2.71GiB          0.00B    1.00GiB          0.0%                Bound   7m38s

9.当该Dataset处于可用状态后，查看是否曾经能够从custom-metrics-api取得监控指标

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" | jq{  "kind": "MetricValueList",  "apiVersion": "custom.metrics.k8s.io/v1beta1",  "metadata": {    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate"  },  "items": [    {      "describedObject": {        "kind": "Dataset",        "namespace": "default",        "name": "spark",        "apiVersion": "data.fluid.io/v1alpha1"      },      "metricName": "capacity_used_rate",      "timestamp": "2021-04-04T07:24:52Z",      "value": "0"    }  ]}

10.创立 HPA工作

$ cat<<EOF > hpa.yamlapiVersion: autoscaling/v2beta2kind: HorizontalPodAutoscalermetadata:  name: sparkspec:  scaleTargetRef:    apiVersion: data.fluid.io/v1alpha1    kind: AlluxioRuntime    name: spark  minReplicas: 1  maxReplicas: 4  metrics:  - type: Object    object:      metric:        name: capacity_used_rate      describedObject:        apiVersion: data.fluid.io/v1alpha1        kind: Dataset        name: spark      target:        type: Value        value: "90"  behavior:    scaleUp:      policies:      - type: Pods        value: 2        periodSeconds: 600    scaleDown:      selectPolicy: DisabledEOF

首先，咱们解读一下从样例配置，这里次要有两局部一个是扩缩容的规定，另一个是扩缩容的灵敏度：

规定：触发扩容行为的条件为Dataset对象的缓存数据量占总缓存能力的90%; 扩容对象为AlluxioRuntime, 最小正本数为1，最大正本数为4; 而Dataset和AlluxioRuntime的对象须要在同一个namespace
策略：能够K8s 1.18以上的版本，能够别离针对扩容和缩容场景设置稳固工夫和一次扩缩容步长比例。比方在本例子, 一次扩容周期为10分钟(periodSeconds),扩容时新增2个正本数，当然这也不能够超过 maxReplicas 的限度；而实现一次扩容后, 冷却工夫(stabilizationWindowSeconds)为20分钟; 而缩容策略能够抉择间接敞开。

11.查看HPA配置，以后缓存空间的数据占比为0。远远低于触发扩容的条件

$ kubectl get hpaNAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGEspark   AlluxioRuntime/spark   0/90      1         4         1          33s$ kubectl describe hpaName:                                                    sparkNamespace:                                               defaultLabels:                                                  <none>Annotations:                                             <none>CreationTimestamp:                                       Wed, 07 Apr 2021 17:36:39 +0800Reference:                                               AlluxioRuntime/sparkMetrics:                                                 ( current / target )  "capacity_used_rate" on Dataset/spark (target value):  0 / 90Min replicas:                                            1Max replicas:                                            4Behavior:  Scale Up:    Stabilization Window: 0 seconds    Select Policy: Max    Policies:      - Type: Pods  Value: 2  Period: 600 seconds  Scale Down:    Select Policy: Disabled    Policies:      - Type: Percent  Value: 100  Period: 15 secondsAlluxioRuntime pods:   1 current / 1 desiredConditions:  Type            Status  Reason               Message  ----            ------  ------               -------  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable rangeEvents:           <none>

12.创立数据预热工作

$ cat<<EOF > dataload.yamlapiVersion: data.fluid.io/v1alpha1kind: DataLoadmetadata:  name: sparkspec:  dataset:    name: spark    namespace: defaultEOF$ kubectl create -f dataload.yaml$ kubectl get dataloadNAME    DATASET   PHASE       AGE   DURATIONspark   spark     Executing   15s   Unfinished

13.此时能够发现缓存的数据量靠近了Fluid能够提供的缓存能力（1GiB）同时触发了弹性伸缩的条件

$  kubectl  get datasetNAME    UFS TOTAL SIZE   CACHED       CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGEspark   2.71GiB          1020.92MiB   1.00GiB          36.8%               Bound   5m15s

从HPA的监控，能够看到Alluxio Runtime的扩容曾经开始, 能够发现扩容的步长为2

$ kubectl get hpaNAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGEspark   AlluxioRuntime/spark   100/90    1         4         2          4m20s$ kubectl describe hpaName:                                                    sparkNamespace:                                               defaultLabels:                                                  <none>Annotations:                                             <none>CreationTimestamp:                                       Wed, 07 Apr 2021 17:56:31 +0800Reference:                                               AlluxioRuntime/sparkMetrics:                                                 ( current / target )  "capacity_used_rate" on Dataset/spark (target value):  100 / 90Min replicas:                                            1Max replicas:                                            4Behavior:  Scale Up:    Stabilization Window: 0 seconds    Select Policy: Max    Policies:      - Type: Pods  Value: 2  Period: 600 seconds  Scale Down:    Select Policy: Disabled    Policies:      - Type: Percent  Value: 100  Period: 15 secondsAlluxioRuntime pods:   2 current / 3 desiredConditions:  Type            Status  Reason              Message  ----            ------  ------              -------  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 3  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable rangeEvents:  Type     Reason                        Age                    From                       Message  ----     ------                        ----                   ----                       -------  Normal   SuccessfulRescale             21s                    horizontal-pod-autoscaler  New size: 2; reason: Dataset metric capacity_used_rate above target  Normal   SuccessfulRescale             6s                     horizontal-pod-autoscaler  New size: 3; reason: Dataset metric capacity_used_rate above target

14.在期待一段时间之后发现数据集的缓存空间由1GiB晋升到了3GiB，数据缓存曾经靠近实现

$ kubectl  get datasetNAME    UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGEspark   2.71GiB          2.59GiB   3.00GiB          95.6%               Bound   12m

同时察看HPA的状态，能够发现此时Dataset对应的runtime的replicas数量为3，曾经应用的缓存空间比例capacity\_used\_rate为85%，曾经不会触发缓存扩容。

$ kubectl get hpaNAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGEspark   AlluxioRuntime/spark   85/90     1         4         3          11m

16.清理环境

kubectl delete hpa sparkkubectl delete dataset spark

总结

Fluid提供了联合Prometheous，Kubernetes HPA和Custom Metrics能力，依据占用缓存空间的比例触发主动弹性伸缩的能力，实现缓存能力的按需应用。这样可能帮忙用户更加灵便的应用通过分布式缓存晋升数据拜访减速能力，后续咱们会提供定时扩缩的能力，为扩缩容提供更强的确定性。

Fluid的代码仓库：https://github.com/fluid-cloudnative/fluid.git ，欢送大家关注、奉献代码和star。

版权申明：本文内容由阿里云实名注册用户自发奉献，版权归原作者所有，阿里云开发者社区不领有其著作权，亦不承当相应法律责任。具体规定请查看《阿里云开发者社区用户服务协定》和《阿里云开发者社区知识产权爱护指引》。如果您发现本社区中有涉嫌剽窃的内容，填写侵权投诉表单进行举报，一经查实，本社区将立即删除涉嫌侵权内容。