本文来自Rancher Labs

在过来的文章中,咱们花了相当大的篇幅来聊对于监控的话题。这是因为当你正在治理Kubernetes集群时,所有都会以极快的速度发生变化。因而有一个工具来监控集群的衰弱状态和资源指标极为重要。

在Rancher 2.5中,咱们引入了基于Prometheus Operator的新版监控,它能够提供Prometheus以及相干监控组件的原生Kubernetes部署和治理。Prometheus Operator能够让你监控集群节点、Kubernetes组件和应用程序工作负载的状态和过程。同时,它还可能通过Prometheus收集的指标来定义告警并且创立自定义仪表盘,通过Grafana能够轻松地可视化收集到的指标。你能够拜访下列链接获取更多对于新版监控组件的细节:

https://rancher.com/docs/ranc...

新版本的监控也采纳prometheus-adapter,开发人员能够利用其基于自定义指标和HPA扩大他们的工作负载。

在本文中,咱们将摸索如何利用Prometheus Operator来抓取自定义指标并利用这些指标进行高级工作负载治理。

装置Prometheus

在Rancher 2.5中装置Prometheus极为简略。仅需拜访Cluster Explorer -> Apps并装置rancher-monitoring即可。

你须要理解以下默认设置:

  • prometheus-adapter将会作为chart装置的一部分启用
  • ServiceMonitorNamespaceSelector 留为空,容许 Prometheus 在所有命名空间中收集 ServiceMonitors

装置实现后,咱们能够从Cluster Explorer拜访监控组件。

部署工作负载

当初让咱们部署一个从应用层裸露自定义指标的示例工作负载。该工作负载裸露了一个简略的应用程序,该应用程序曾经应用Prometheus client_golang库进行了检测,并在/metric端点上提供了一些自定义指标。

它有两个指标:

  • http_requests_total
  • http_request_duration_seconds

以下manifest部署了工作负载、相干服务以及拜访该工作负载的ingress:

apiVersion: apps/v1kind: Deploymentmetadata:  labels:    app.kubernetes.io/name: prometheus-example-app  name: prometheus-example-appspec:  replicas: 1  selector:    matchLabels:      app.kubernetes.io/name: prometheus-example-app  template:    metadata:      labels:        app.kubernetes.io/name: prometheus-example-app    spec:      containers:      - name: prometheus-example-app        image: gmehta3/demo-app:metrics        ports:        - name: web          containerPort: 8080---apiVersion: v1kind: Servicemetadata:  name: prometheus-example-app  labels:    app.kubernetes.io/name: prometheus-example-appspec:  selector:    app.kubernetes.io/name: prometheus-example-app  ports:    - protocol: TCP      port: 8080      targetPort: 8080      name: web---apiVersion: networking.k8s.io/v1beta1kind: Ingressmetadata:    name: prometheus-example-appspec:    rules:    - host: hpa.demo      http:        paths:        - path: /          backend:            serviceName: prometheus-example-app            servicePort: 8080

部署ServiceMonitor

ServiceMonitor是一个自定义资源定义(CRD),能够让咱们申明性地定义如何监控一组动静服务。

你能够拜访以下链接查看残缺的ServiceMonitor标准:

https://github.com/prometheus...

当初,咱们来部署ServiceMonitor,Prometheus用它来收集组成prometheus-example-app Kubernetes服务的pod。

kind: ServiceMonitormetadata:  name: prometheus-example-appspec:  selector:    matchLabels:      app.kubernetes.io/name: prometheus-example-app  endpoints:  - port: web

如你所见,当初用户能够在Rancher监控中浏览ServiceMonitor。

不久之后,新的service monitor和服务相关联的pod应该会反映在Prometheus服务发现中。

咱们也可能在Prometheus中看到指标。

部署Grafana仪表盘

在Rancher 2.5中,监控能够让用户将Grafana仪表盘存储为cattle-dashboards命名空间中的ConfigMaps。

用户或集群管理员当初能够在这一命名空间中增加更多的仪表盘以扩大Grafana的自定义仪表盘。

Dashboard ConfigMap Example
apiVersion: v1kind: ConfigMapmetadata:  name: prometheus-example-app-dashboard  namespace: cattle-dashboards  labels:    grafana_dashboard: "1"data:  prometheus-example-app.json: |    {    "annotations": {        "list": [        {            "builtIn": 1,            "datasource": "-- Grafana --",            "enable": true,            "hide": true,            "iconColor": "rgba(0, 211, 255, 1)",            "name": "Annotations & Alerts",            "type": "dashboard"        }        ]    },    "editable": true,    "gnetId": null,    "graphTooltip": 0,    "links": [],    "panels": [        {        "aliasColors": {},        "bars": false,        "dashLength": 10,        "dashes": false,        "datasource": null,        "fieldConfig": {            "defaults": {            "custom": {}            },            "overrides": []        },        "fill": 1,        "fillGradient": 0,        "gridPos": {            "h": 9,            "w": 12,            "x": 0,            "y": 0        },        "hiddenSeries": false,        "id": 2,        "legend": {            "avg": false,            "current": false,            "max": false,            "min": false,            "show": true,            "total": false,            "values": false        },        "lines": true,        "linewidth": 1,        "nullPointMode": "null",        "percentage": false,        "pluginVersion": "7.1.5",        "pointradius": 2,        "points": false,        "renderer": "flot",        "seriesOverrides": [],        "spaceLength": 10,        "stack": false,        "steppedLine": false,        "targets": [            {            "expr": "rate(http_requests_total{code=\"200\",service=\"prometheus-example-app\"}[5m])",            "instant": false,            "interval": "",            "legendFormat": "",            "refId": "A"            }        ],        "thresholds": [],        "timeFrom": null,        "timeRegions": [],        "timeShift": null,        "title": "http_requests_total_200",        "tooltip": {            "shared": true,            "sort": 0,            "value_type": "individual"        },        "type": "graph",        "xaxis": {            "buckets": null,            "mode": "time",            "name": null,            "show": true,            "values": []        },        "yaxes": [            {            "format": "short",            "label": null,            "logBase": 1,            "max": null,            "min": null,            "show": true            },            {            "format": "short",            "label": null,            "logBase": 1,            "max": null,            "min": null,            "show": true            }        ],        "yaxis": {            "align": false,            "alignLevel": null        }        },        {        "aliasColors": {},        "bars": false,        "dashLength": 10,        "dashes": false,        "datasource": null,        "description": "",        "fieldConfig": {            "defaults": {            "custom": {}            },            "overrides": []        },        "fill": 1,        "fillGradient": 0,        "gridPos": {            "h": 8,            "w": 12,            "x": 0,            "y": 9        },        "hiddenSeries": false,        "id": 4,        "legend": {            "avg": false,            "current": false,            "max": false,            "min": false,            "show": true,            "total": false,            "values": false        },        "lines": true,        "linewidth": 1,        "nullPointMode": "null",        "percentage": false,        "pluginVersion": "7.1.5",        "pointradius": 2,        "points": false,        "renderer": "flot",        "seriesOverrides": [],        "spaceLength": 10,        "stack": false,        "steppedLine": false,        "targets": [            {            "expr": "rate(http_requests_total{code!=\"200\",service=\"prometheus-example-app\"}[5m])",            "interval": "",            "legendFormat": "",            "refId": "A"            }        ],        "thresholds": [],        "timeFrom": null,        "timeRegions": [],        "timeShift": null,        "title": "http_requests_total_not_200",        "tooltip": {            "shared": true,            "sort": 0,            "value_type": "individual"        },        "type": "graph",        "xaxis": {            "buckets": null,            "mode": "time",            "name": null,            "show": true,            "values": []        },        "yaxes": [            {            "format": "short",            "label": null,            "logBase": 1,            "max": null,            "min": null,            "show": true            },            {            "format": "short",            "label": null,            "logBase": 1,            "max": null,            "min": null,            "show": true            }        ],        "yaxis": {            "align": false,            "alignLevel": null        }        }    ],    "schemaVersion": 26,    "style": "dark",    "tags": [],    "templating": {        "list": []    },    "time": {        "from": "now-15m",        "to": "now"    },    "timepicker": {        "refresh_intervals": [        "5s",        "10s",        "30s",        "1m",        "5m",        "15m",        "30m",        "1h",        "2h",        "1d"        ]    },    "timezone": "",    "title": "prometheus example app",    "version": 1    }

当初,用户应该可能在Grafana中拜访prometheus example app的仪表盘。

自定义指标的HPA

这一部分假如你曾经将prometheus-adapter作为监控的一部分装置结束了。实际上,在默认状况下,监控安装程序会装置prometheus-adapter。

用户当初能够创立一个HPA spec,如下所示:

apiVersion: autoscaling/v2beta2kind: HorizontalPodAutoscalermetadata:  name: prometheus-example-app-hpaspec:  scaleTargetRef:    apiVersion: apps/v1    kind: Deployment    name: prometheus-example-app  minReplicas: 1  maxReplicas: 5  metrics:  - type: Object    object:        describedObject:            kind: Service            name: prometheus-example-app        metric:            name: http_requests        target:            averageValue: "5"            type: AverageValue

你能够查看以下链接获取对于HPA的更多信息:

https://kubernetes.io/docs/ta...

咱们将应用自定义的http_requests_total指标来执行pod主动伸缩。

当初咱们能够生成一个样本负载来查看HPA的运行状况。我能够应用hey进行同样的操作。

hey -c 10 -n 5000 http://hpa.demo

总 结

在本文中,咱们探讨了Rancher 2.5中新监控的灵活性。开发人员和集群管理员能够利用该堆栈来监控它们的工作负载,部署可视化,并利用Kubernetes内可用的高级工作负载治理性能。