共计 7621 个字符,预计需要花费 20 分钟才能阅读完成。
要点 01:软件配置项的起源次要有 2 块:命令行参数 和 配置文件
- k8s 中的组件个别都是遵循下面的模式
- 咱们以 kube-scheduler 为例
- 命令参数举例:能够看到 –xxx=xxx 的传参,当然所有参数都是有默认值的,如果你不传就走默认
/usr/local/bin/kube-scheduler --log-dir=/var/log/k8s --logtostderr=false --alsologtostderr=true --config=/etc/k8s/kube-scheduler.yaml --kube-api-qps=500 --kube-api-burst=500 --authentication-kubeconfig=/etc/k8s/scheduler.conf --authorization-kubeconfig=/etc/k8s/scheduler.conf --kubeconfig=/etc/k8s/scheduler.conf --leader-elect=true --v=2
- 下面的命令行参数中能够看到 –config=xxx.yaml 指定了 配置文件的门路
- 比方咱们能够查看 这个配置文件的内容:发现根本就是官网的默认配置
cat /etc/k8s/kube-scheduler.yaml | |
apiVersion: kubescheduler.config.k8s.io/v1beta1 | |
kind: KubeSchedulerConfiguration | |
clientConnection: | |
kubeconfig: "/etc/k8s/scheduler.conf" | |
leaderElection: | |
leaderElect: true | |
metricsBindAddress: 0.0.0.0:10251 |
要点 02:k8s 组件提供了一个利用运行时查看失效配置的接口
思考为何要提供?次要起因有上面几点
- 来自命令行和配置文件的配置 2 块可能有些笼罩的中央
- 配置我的项目太多了
在没有配置热更新的状况下:查看变更是否失效:
- 配置文件曾经更改,但遗记是利用重启前还是重启后改的了
- 所以考究的我的项目外面都会留有一个 http 接口
- 间接将以后利用内存中失效的配置我的项目打印进去,不便排查问题
- 所以咱们前面写 golang 的我的项目也能够仿照这个接口
要点 03 : 如何申请 scheduler 的配置接口
k8s 组件对应查看配置的接口就是 configz
拜访组件接口须要鉴权,咱们能够通过 sa 来实现
如何在 1.24 集群中创立 rbac
apiVersion: rbac.authorization.k8s.io/v1 # api 的 version | |
kind: ClusterRole # 类型 | |
metadata: | |
name: prometheus | |
rules: | |
- apiGroups: [""] | |
resources: # 资源 | |
- nodes | |
- nodes/metrics | |
- nodes/proxy | |
- services | |
- endpoints | |
- pods | |
verbs: ["get", "list", "watch"] | |
- apiGroups: | |
- extensions | |
resources: | |
- ingresses | |
verbs: ["get", "list", "watch"] | |
- nonResourceURLs: ["/metrics"] | |
verbs: ["get"] | |
--- | |
apiVersion: v1 | |
kind: ServiceAccount | |
metadata: | |
name: prometheus # 自定义名字 | |
namespace: kube-system # 命名空间 | |
--- | |
apiVersion: rbac.authorization.k8s.io/v1 | |
kind: ClusterRoleBinding | |
metadata: | |
name: prometheus | |
roleRef: # 抉择须要绑定的 Role | |
apiGroup: rbac.authorization.k8s.io | |
kind: ClusterRole | |
name: prometheus | |
subjects: # 对象 | |
- kind: ServiceAccount | |
name: prometheus | |
namespace: kube-system | |
--- | |
apiVersion: v1 | |
kind: Secret | |
metadata: | |
namespace: kube-system | |
name: prometheus | |
annotations: | |
kubernetes.io/service-account.name: prometheus | |
type: kubernetes.io/service-account-token | |
获取 token 命令
root@k8s-master01:~# TOKEN=$(kubectl -n kube-system get secret prometheus -o jsonpath='{.data.token}'| base64 --decode ) | |
root@k8s-master01:~# echo $TOKEN | |
eyJhbGciOiJSUzI1NiIsImtpZCI6ImFVMS1mYlhobWIxcF92djBwbUIxZDhTVlFWd0VNa3VpNDlmOUhqcG9qSlkifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI2OTg2NWUwYy0yOGE4LTQ3YTEtYWEzYy03NThmNDlkYjA1YWUiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.SlORtZtVvLptYWE3zvblaOMxHNMBrHVTTHra7fO1RrwxdK3Bzc42ETLvkzVfQmAQlWrq5yiB4HiKFLe4qY2KVwK3qLDS_sWADLI16Sv8-O1Dt0oOQ0UZD0VOSGY0XEq2EGUxgxnx_JWllgEuMd0rjxtAtyjFh9wjCo_07lFCj44BffGGFp6Kovd8Dl_CJKpORakaJW-haIvTmTlbFPbRKojRTyKvtNCVn0zIXsz8Esp7z9XZmtUvZmHqNlY7bAFtGM9qLWUY_PkM1C0lQ2ZDKASdhZpx6LJr1Wo4WSNILCVfECT0sd6TnFHbgd1NwBc0kcTct5VbST76AJwUpG5esA |
在 http 申请中把下面的 TOKEN 作为 beare-token 传过来就能够了
申请的 curl 命令如下 (在 kube-scheduler 部署的节点上执行,通常是 master)
curl -k -s https://localhost:10259/configz --header "Authorization: Bearer $TOKEN" |python -m json.tool
要点 04 依据返回的 json 配置段来剖析插件开启的状况
- json 数据如下
{ | |
"componentconfig": { | |
"AlgorithmSource": { | |
"Policy": null, | |
"Provider": "DefaultProvider" | |
}, | |
"ClientConnection": {"AcceptContentTypes": "","Burst": 100,"ContentType":"application/vnd.kubernetes.protobuf","Kubeconfig":"/etc/k8s/scheduler.conf","QPS": 50}, | |
"EnableContentionProfiling": true, | |
"EnableProfiling": true, | |
"Extenders": null, | |
"HealthzBindAddress": "0.0.0.0:10251", | |
"LeaderElection": { | |
"LeaderElect": true, | |
"LeaseDuration": "15s", | |
"RenewDeadline": "10s", | |
"ResourceLock": "leases", | |
"ResourceName": "kube-scheduler", | |
"ResourceNamespace": "kube-system", | |
"RetryPeriod": "2s" | |
}, | |
"MetricsBindAddress": "0.0.0.0:10251", | |
"Parallelism": 16, | |
"PercentageOfNodesToScore": 0, | |
"PodInitialBackoffSeconds": 1, | |
"PodMaxBackoffSeconds": 10, | |
"Profiles": [ | |
{ | |
"PluginConfig": [ | |
{ | |
"Args": { | |
"MinCandidateNodesAbsolute": 100, | |
"MinCandidateNodesPercentage": 10 | |
}, | |
"Name": "DefaultPreemption" | |
}, | |
{ | |
"Args": {"HardPodAffinityWeight": 1}, | |
"Name": "InterPodAffinity" | |
}, | |
{ | |
"Args": {"AddedAffinity": null}, | |
"Name": "NodeAffinity" | |
}, | |
{ | |
"Args": { | |
"IgnoredResourceGroups": null, | |
"IgnoredResources": null | |
}, | |
"Name": "NodeResourcesFit" | |
}, | |
{ | |
"Args": { | |
"Resources": [ | |
{ | |
"Name": "cpu", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "memory", | |
"Weight": 1 | |
} | |
] | |
}, | |
"Name": "NodeResourcesLeastAllocated" | |
}, | |
{ | |
"Args": { | |
"DefaultConstraints": null, | |
"DefaultingType": "System" | |
}, | |
"Name": "PodTopologySpread" | |
}, | |
{ | |
"Args": {"BindTimeoutSeconds": 600}, | |
"Name": "VolumeBinding" | |
} | |
], | |
"Plugins": { | |
"Bind": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "DefaultBinder", | |
"Weight": 0 | |
} | |
] | |
}, | |
"Filter": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "NodeUnschedulable", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeName", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "TaintToleration", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeAffinity", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodePorts", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeResourcesFit", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "VolumeRestrictions", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "EBSLimits", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "GCEPDLimits", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeVolumeLimits", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "AzureDiskLimits", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "VolumeBinding", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "VolumeZone", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "PodTopologySpread", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "InterPodAffinity", | |
"Weight": 0 | |
} | |
] | |
}, | |
"Permit": { | |
"Disabled": null, | |
"Enabled": null | |
}, | |
"PostBind": { | |
"Disabled": null, | |
"Enabled": null | |
}, | |
"PostFilter": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "DefaultPreemption", | |
"Weight": 0 | |
} | |
] | |
}, | |
"PreBind": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "VolumeBinding", | |
"Weight": 0 | |
} | |
] | |
}, | |
"PreFilter": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "NodeResourcesFit", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodePorts", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "PodTopologySpread", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "InterPodAffinity", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "VolumeBinding", | |
"Weight": 0 | |
} | |
] | |
}, | |
"PreScore": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "InterPodAffinity", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "PodTopologySpread", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "TaintToleration", | |
"Weight": 0 | |
} | |
] | |
}, | |
"QueueSort": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "PrioritySort", | |
"Weight": 0 | |
} | |
] | |
}, | |
"Reserve": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "VolumeBinding", | |
"Weight": 0 | |
} | |
] | |
}, | |
"Score": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "NodeResourcesBalancedAllocation", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "ImageLocality", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "InterPodAffinity", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "NodeResourcesLeastAllocated", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "NodeAffinity", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "NodePreferAvoidPods", | |
"Weight": 10000 | |
}, | |
{ | |
"Name": "PodTopologySpread", | |
"Weight": 2 | |
}, | |
{ | |
"Name": "TaintToleration", | |
"Weight": 1 | |
} | |
] | |
} | |
}, | |
"SchedulerName": "default-scheduler" | |
} | |
] | |
} | |
} |
前置常识 k8s 的调度框架 scheduler framework
- 文档地位
- 这张架构图肯定要烂熟于心
咱们发现在 componentconfig.Profiles 其实次要分两块:
Plugins 段代表调度框架每个扩大点的插件开启和禁用的状况:
- 能够了解为在各个阶段:须要顺次执行开启列表中的插件,不执行禁用列表中的插件
- 比方在 Filter 阶段 (我节选了一部分) 都是咱们常常能看到的插件:去掉 NodeUnschedulable、过滤 NodeName、过滤 TaintToleration 还有依据 request 资源状况的 NodeResourcesFit
"Filter": { | |
"Disabled": null, | |
"Enabled": [ | |
{ | |
"Name": "NodeUnschedulable", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeName", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "TaintToleration", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeAffinity", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodePorts", | |
"Weight": 0 | |
}, | |
{ | |
"Name": "NodeResourcesFit", | |
"Weight": 0 | |
}, | |
] | |
}, |
- PluginConfig 段代表各个插件的配置字段:这里是个 map 的 k - v 模式:很好了解就是每个插件的配置字段不一样,间接用 map 最不便
- 比方 NodeResourcesLeastAllocated 插件
{ | |
"Args": { | |
"Resources": [ | |
{ | |
"Name": "cpu", | |
"Weight": 1 | |
}, | |
{ | |
"Name": "memory", | |
"Weight": 1 | |
} | |
] | |
}, | |
"Name": "NodeResourcesLeastAllocated" | |
}, |
要点 6:对于插件权重的探讨
- 在下面的配置中能够看到很多 阶段如 Filter 和 Score 的插件中都有权重的配置
{ | |
"Name": "NodePreferAvoidPods", | |
"Weight": 10000 | |
}, |
- 然而只有在 Score 中的插件的 Weight 值才大于 0
- 其余阶段的插件的 Weight 值都是 0
联合代码看看:找到 Plugin.Weight (1.22 分支)
- 地位 D:\go_path\src\github.com\kubernetes\kubernetes\pkg\scheduler\apis\config\types.go
// Plugin specifies a plugin name and its weight when applicable. Weight is used only for Score plugins. | |
type Plugin struct { | |
// Name defines the name of plugin | |
Name string | |
// Weight defines the weight of plugin, only used for Score plugins. | |
Weight int32 | |
} |
- 咱们发现 Plugin 构造体就是一个名字和权重
- 并且从正文中明确的看到了权重只有在 Score 阶段的插件中才有作用
小乙老师对于 k8s 调度器源码文章举荐
- k8s 默认调度器对于 pod 申请资源过滤的源码细节
- 从 k8s 集群 e2e 调度慢告警看 kube-scheduler 源码
那么基于实在负载调度的调度器该怎么编写呢 ?
正文完
发表至: kubernetes
2023-02-24