共计 7267 个字符,预计需要花费 19 分钟才能阅读完成。
一. 什么是 kubesphere
kubesphere 是青云推出的 kubernetes 治理组件,它运行在 kubernete 集群之上,提供了监控、devops、服务网格等一系列的下层性能。
青云同时还推出了 kubekey 工具,用以部署 kubenetes 和 kubesphere,反对:
- 应用 kubekey 部署 kubernetes;
- 应用 kubekey 部署 kubernetes 和 kubesphere;
二. 部署多 master 的 kubesphere
应用 kubekey(即 kk)工具部署时,它会先部署 kubernetes,而后在此基础上,部署 kubesphere。
1. 筹备节点
筹备 3 个 Centos7.* 节点:
node1 192.168.1.101
node2 192.168.1.102
node3 192.168.1.103
在每个节点上,装置依赖包:
yum install -y socat conntrack ebtables ipset
2. 筹备 kubekey
在 首节点 上下载 kk:
export KKZONE=cn
curl -sfL https://get-kk.kubesphere.io | VERSION=v1.1.1 sh -
3. 创立集群配置
应用 kk 工具,生成集群配置:
./kk create config --with-kubernetes v1.20.6 --with-kubesphere v3.1.1
其中:
- –with-kubernetes v1.20.6:示意要装置的 kubernetes=v1.20.6 版本;
- –with-kubesphere v3.1.1:示意装置的 kubesphere=v3.1.1 版本,若不愿装置 kubespher 组件,可不提供该参数;
命名执行结束后,将生成 config-sample.yaml 文件,批改文件内容:
- 批改 hosts 中的节点配置;
- 多 master 环境,须要提供 controlPlaneEndpoint 中的 LB 地址,这里仅填写其中一个 master 的 ip;
apiVersion: kubekey.kubesphere.io/v1alpha1
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: node1, address: 192.168.1.101, internalAddress: 192.168.1.101, user: root, password: ******}
- {name: node2, address: 192.168.1.102, internalAddress: 192.168.1.102, user: root, password: ******}
- {name: node3, address: 192.168.1.103, internalAddress: 192.168.1.103, user: root, password: ******}
roleGroups:
etcd:
- node1
- node2
- node3
master:
- node1
- node2
- node3
worker:
- node1
- node2
- node3
controlPlaneEndpoint:
domain: lb.kubesphere.local
address: "192.168.1.101"
port: 6443
kubernetes:
version: v1.20.6
imageRepo: kubesphere
clusterName: cluster.local
...
4. 创立集群
应用上一步的集群配置,创立集群:
./kk create cluster -f config-sample.yaml
该命令执行过程中,会先创立 kubernetes 集群,而后再装置 kubesphere 组件;
装置过程中会打印装置日志,除此之外,还能够通用如下命令查看日志:
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f
装置结束后,会输入 kubesphere UI 的登录信息:
#####################################################
### Welcome to KubeSphere! ###
#####################################################
Console: http://192.168.1.101:30880
Account: ****
Password: ****
5. 删除集群
若集群装置过程中,呈现谬误,能够删除集群,批改配置后,重新安装:
## 删除集群
./kk delete cluster -f config-sample.yaml
三.kubesphere 监控组件
装置 kubesphere 时,会默认装置 prometheus 的监控组件:
# kubectl get pod -n kubesphere-monitoring-system
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 7d6h
alertmanager-main-1 2/2 Running 0 7d6h
alertmanager-main-2 2/2 Running 0 7d6h
kube-state-metrics-577b8b4cf-7wp5w 3/3 Running 0 7d6h
node-exporter-2p7jz 2/2 Running 0 7d6h
node-exporter-5njs6 2/2 Running 0 7d6h
node-exporter-6ndb9 2/2 Running 0 7d6h
notification-manager-deployment-97dfccc89-625b8 1/1 Running 0 7d6h
notification-manager-deployment-97dfccc89-k8767 1/1 Running 0 7d6h
notification-manager-operator-59cbfc566b-4wp9m 2/2 Running 4 7d6h
prometheus-k8s-0 3/3 Running 1 4d4h
prometheus-k8s-1 3/3 Running 1 4d4h
prometheus-operator-8f97cb8c6-dq4d2 2/2 Running 0 7d6h
kubesphere 的监控组件,应用 prometheus-operator 来治理 prometheus 的各种 prometheusrule、serviceMonitor 等各种 CRD 对象及其状态变动。
具体部署的 prometheus、alertmanager 的正本数,在 ks-installer 中指定:
prometheus 的正本数:
- 若节点数 <3,则默认 prometheus 正本数 =1;
- 若节点数 >=3,则默认 prometheus 正本数 =2;
// ks-installer/roles/ks-monitor/template/prometheus-prometheus.yaml.j2
{% if nodeNum is defined and nodeNum < 3 %}
replicas: {{monitoring.prometheus.replicas | default(monitoring.prometheusReplicas) | default(1) }}
{% else %}
replicas: {{monitoring.prometheus.replicas | default(monitoring.prometheusReplicas) | default(2) }}
{% endif %}
alertmanager 的正本数:
- 若节点数 <3,则默认 alertmanager 正本数 =1;
- 若节点数 >=3,则默认 alertmanager 正本数 =3;
// ks-installer/roles/ks-monitor/template/alertmanager-alertmanager.yaml.j2
{% if nodeNum is defined and nodeNum < 3 %}
replicas: {{monitoring.alertmanager.replicas | default(monitoring.alertmanagerReplicas) | default(1) }}
{% else %}
replicas: {{monitoring.alertmanager.replicas | default(monitoring.alertmanagerReplicas) | default(3) }}
{% endif %}
四.kubesphere 监控 API 源码
以 kubesphere 上监控集群 CPU 应用状况为例,剖析其前后端监控的 API 接口,及其源码实现:
1. 前端调用:
GET http://192.168.1.101:30880/kapis/monitoring.kubesphere.io/v1alpha3/cluster?start=1653737380&end=1653743380&step=120s×=50&metrics_filter=cluster_cpu_utilisation
能够看到,指标名称为 cluster_cpu_utilisation。
2. 后端代码:
API 注册的代码:
// kubesphere/pkg/kapis/monitoring/v1alpha3/register.go
func AddToContainer(c *restful.Container, ...) error {
...
ws.Route(ws.GET("/cluster").
To(h.handleClusterMetricsQuery).
Doc("Get cluster-level metric data.").
Param(ws.QueryParameter("metrics_filter", "The metric name filter consists of a regexp pattern. It specifies which metric data to return. For example, the following filter matches both cluster CPU usage and disk usage: `cluster_cpu_usage|cluster_disk_size_usage`. View available metrics at [kubesphere.io](https://docs.kubesphere.io/advanced-v2.0/zh-CN/api-reference/monitoring-metrics/).").DataType("string").Required(false)).
Param(ws.QueryParameter("start", "Start time of query. Use **start** and **end** to retrieve metric data over a time span. It is a string with Unix time format, eg. 1559347200.").DataType("string").Required(false)).
Param(ws.QueryParameter("end", "End time of query. Use **start** and **end** to retrieve metric data over a time span. It is a string with Unix time format, eg. 1561939200.").DataType("string").Required(false)).
Param(ws.QueryParameter("step", "Time interval. Retrieve metric data at a fixed interval within the time range of start and end. It requires both **start** and **end** are provided. The format is [0-9]+[smhdwy]. Defaults to 10m (i.e. 10 min).").DataType("string").DefaultValue("10m").Required(false)).
Param(ws.QueryParameter("time", "A timestamp in Unix time format. Retrieve metric data at a single point in time. Defaults to now. Time and the combination of start, end, step are mutually exclusive.").DataType("string").Required(false)).
Metadata(restfulspec.KeyOpenAPITags, []string{constants.ClusterMetricsTag}).
Writes(model.Metrics{}).
Returns(http.StatusOK, respOK, model.Metrics{})).
Produces(restful.MIME_JSON)
...
}
API handler 的代码:
// kubesphere/pkg/kapis/monitoring/v1alpha3/handler.go
func (h handler) handleClusterMetricsQuery(req *restful.Request, resp *restful.Response) {params := parseRequestParams(req)
opt, err := h.makeQueryOptions(params, monitoring.LevelCluster)
if err != nil {api.HandleBadRequest(resp, nil, err)
return
}
h.handleNamedMetricsQuery(resp, opt)
}
具体查问时,它会向 prometheus 发送 metric 的查问:
- 查问表达式:makeExpr(metric, *ops),这里 metric=cluster_cpu_utilisation;
- 执行查问:p.client.QueryRange(…),向 prometheus 发动查问;
// kubesphere/pkg/simple/client/monitoring/prometheus/prometheus.go
func (p prometheus) GetNamedMetricsOverTime(metrics []string, start, end time.Time, step time.Duration, o monitoring.QueryOption) []monitoring.Metric {var res []monitoring.Metric
...
timeRange := apiv1.Range{
Start: start,
End: end,
Step: step,
}
for _, metric := range metrics {wg.Add(1)
go func(metric string) {parsedResp := monitoring.Metric{MetricName: metric}
// 查问 prometheus
value, _, err := p.client.QueryRange(context.Background(), makeExpr(metric, *opts), timeRange)
parsedResp.MetricData = parseQueryRangeResp(value, genMetricFilter(o))
...
res = append(res, parsedResp)
wg.Done()}(metric)
}
wg.Wait()
return res
}
重点看一下如何结构查问表达式:
// kubesphere/pkg/simple/client/monitoring/prometheus/promql.go
var promQLTemplates = map[string]string{
//cluster
"cluster_cpu_utilisation": ":node_cpu_utilisation:avg1m",
...
}
func makeExpr(metric string, opts monitoring.QueryOptions) string {tmpl := promQLTemplates[metric]
switch opts.Level {
case monitoring.LevelCluster:
return tmpl
...
}
对于前端查问的 cluster_cpu_utilisation,发给 prometheus 的是 ”:node_cpu_utilisation:avg1m”,prometheus 上这样一条表达式规定:
- expr: |
avg(irate(node_cpu_used_seconds_total{job="node-exporter"}[5m]))
record: :node_cpu_utilisation:avg1m
所以,cluster_cpu_utilisation 指标最终是执行如下 promql 的查问后果:
avg(irate(node_cpu_used_seconds_total{job="node-exporter"}[5m]))
参考:
1.https://v3-1.docs.kubesphere….
2.https://kubesphere.com.cn/for…