共计 7621 个字符,预计需要花费 20 分钟才能阅读完成。
要点 01:软件配置项的起源次要有 2 块:命令行参数 和 配置文件
- k8s 中的组件个别都是遵循下面的模式
- 咱们以 kube-scheduler 为例
- 命令参数举例:能够看到 –xxx=xxx 的传参,当然所有参数都是有默认值的,如果你不传就走默认
/usr/local/bin/kube-scheduler --log-dir=/var/log/k8s --logtostderr=false --alsologtostderr=true --config=/etc/k8s/kube-scheduler.yaml --kube-api-qps=500 --kube-api-burst=500 --authentication-kubeconfig=/etc/k8s/scheduler.conf --authorization-kubeconfig=/etc/k8s/scheduler.conf --kubeconfig=/etc/k8s/scheduler.conf --leader-elect=true --v=2
- 下面的命令行参数中能够看到 –config=xxx.yaml 指定了 配置文件的门路
- 比方咱们能够查看 这个配置文件的内容:发现根本就是官网的默认配置
cat /etc/k8s/kube-scheduler.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: "/etc/k8s/scheduler.conf"
leaderElection:
leaderElect: true
metricsBindAddress: 0.0.0.0:10251
要点 02:k8s 组件提供了一个利用运行时查看失效配置的接口
-
思考为何要提供?次要起因有上面几点
- 来自命令行和配置文件的配置 2 块可能有些笼罩的中央
- 配置我的项目太多了
-
在没有配置热更新的状况下:查看变更是否失效:
- 配置文件曾经更改,但遗记是利用重启前还是重启后改的了
- 所以考究的我的项目外面都会留有一个 http 接口
- 间接将以后利用内存中失效的配置我的项目打印进去,不便排查问题
- 所以咱们前面写 golang 的我的项目也能够仿照这个接口
要点 03 : 如何申请 scheduler 的配置接口
k8s 组件对应查看配置的接口就是 configz
拜访组件接口须要鉴权,咱们能够通过 sa 来实现
如何在 1.24 集群中创立 rbac
apiVersion: rbac.authorization.k8s.io/v1 # api 的 version
kind: ClusterRole # 类型
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: # 资源
- nodes
- nodes/metrics
- nodes/proxy
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups:
- extensions
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus # 自定义名字
namespace: kube-system # 命名空间
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef: # 抉择须要绑定的 Role
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects: # 对象
- kind: ServiceAccount
name: prometheus
namespace: kube-system
---
apiVersion: v1
kind: Secret
metadata:
namespace: kube-system
name: prometheus
annotations:
kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token
获取 token 命令
root@k8s-master01:~# TOKEN=$(kubectl -n kube-system get secret prometheus -o jsonpath='{.data.token}'| base64 --decode )
root@k8s-master01:~# echo $TOKEN
eyJhbGciOiJSUzI1NiIsImtpZCI6ImFVMS1mYlhobWIxcF92djBwbUIxZDhTVlFWd0VNa3VpNDlmOUhqcG9qSlkifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI2OTg2NWUwYy0yOGE4LTQ3YTEtYWEzYy03NThmNDlkYjA1YWUiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06cHJvbWV0aGV1cyJ9.SlORtZtVvLptYWE3zvblaOMxHNMBrHVTTHra7fO1RrwxdK3Bzc42ETLvkzVfQmAQlWrq5yiB4HiKFLe4qY2KVwK3qLDS_sWADLI16Sv8-O1Dt0oOQ0UZD0VOSGY0XEq2EGUxgxnx_JWllgEuMd0rjxtAtyjFh9wjCo_07lFCj44BffGGFp6Kovd8Dl_CJKpORakaJW-haIvTmTlbFPbRKojRTyKvtNCVn0zIXsz8Esp7z9XZmtUvZmHqNlY7bAFtGM9qLWUY_PkM1C0lQ2ZDKASdhZpx6LJr1Wo4WSNILCVfECT0sd6TnFHbgd1NwBc0kcTct5VbST76AJwUpG5esA
在 http 申请中把下面的 TOKEN 作为 beare-token 传过来就能够了
申请的 curl 命令如下 (在 kube-scheduler 部署的节点上执行,通常是 master)
curl -k -s https://localhost:10259/configz --header "Authorization: Bearer $TOKEN" |python -m json.tool
要点 04 依据返回的 json 配置段来剖析插件开启的状况
- json 数据如下
{
"componentconfig": {
"AlgorithmSource": {
"Policy": null,
"Provider": "DefaultProvider"
},
"ClientConnection": {"AcceptContentTypes": "","Burst": 100,"ContentType":"application/vnd.kubernetes.protobuf","Kubeconfig":"/etc/k8s/scheduler.conf","QPS": 50},
"EnableContentionProfiling": true,
"EnableProfiling": true,
"Extenders": null,
"HealthzBindAddress": "0.0.0.0:10251",
"LeaderElection": {
"LeaderElect": true,
"LeaseDuration": "15s",
"RenewDeadline": "10s",
"ResourceLock": "leases",
"ResourceName": "kube-scheduler",
"ResourceNamespace": "kube-system",
"RetryPeriod": "2s"
},
"MetricsBindAddress": "0.0.0.0:10251",
"Parallelism": 16,
"PercentageOfNodesToScore": 0,
"PodInitialBackoffSeconds": 1,
"PodMaxBackoffSeconds": 10,
"Profiles": [
{
"PluginConfig": [
{
"Args": {
"MinCandidateNodesAbsolute": 100,
"MinCandidateNodesPercentage": 10
},
"Name": "DefaultPreemption"
},
{
"Args": {"HardPodAffinityWeight": 1},
"Name": "InterPodAffinity"
},
{
"Args": {"AddedAffinity": null},
"Name": "NodeAffinity"
},
{
"Args": {
"IgnoredResourceGroups": null,
"IgnoredResources": null
},
"Name": "NodeResourcesFit"
},
{
"Args": {
"Resources": [
{
"Name": "cpu",
"Weight": 1
},
{
"Name": "memory",
"Weight": 1
}
]
},
"Name": "NodeResourcesLeastAllocated"
},
{
"Args": {
"DefaultConstraints": null,
"DefaultingType": "System"
},
"Name": "PodTopologySpread"
},
{
"Args": {"BindTimeoutSeconds": 600},
"Name": "VolumeBinding"
}
],
"Plugins": {
"Bind": {
"Disabled": null,
"Enabled": [
{
"Name": "DefaultBinder",
"Weight": 0
}
]
},
"Filter": {
"Disabled": null,
"Enabled": [
{
"Name": "NodeUnschedulable",
"Weight": 0
},
{
"Name": "NodeName",
"Weight": 0
},
{
"Name": "TaintToleration",
"Weight": 0
},
{
"Name": "NodeAffinity",
"Weight": 0
},
{
"Name": "NodePorts",
"Weight": 0
},
{
"Name": "NodeResourcesFit",
"Weight": 0
},
{
"Name": "VolumeRestrictions",
"Weight": 0
},
{
"Name": "EBSLimits",
"Weight": 0
},
{
"Name": "GCEPDLimits",
"Weight": 0
},
{
"Name": "NodeVolumeLimits",
"Weight": 0
},
{
"Name": "AzureDiskLimits",
"Weight": 0
},
{
"Name": "VolumeBinding",
"Weight": 0
},
{
"Name": "VolumeZone",
"Weight": 0
},
{
"Name": "PodTopologySpread",
"Weight": 0
},
{
"Name": "InterPodAffinity",
"Weight": 0
}
]
},
"Permit": {
"Disabled": null,
"Enabled": null
},
"PostBind": {
"Disabled": null,
"Enabled": null
},
"PostFilter": {
"Disabled": null,
"Enabled": [
{
"Name": "DefaultPreemption",
"Weight": 0
}
]
},
"PreBind": {
"Disabled": null,
"Enabled": [
{
"Name": "VolumeBinding",
"Weight": 0
}
]
},
"PreFilter": {
"Disabled": null,
"Enabled": [
{
"Name": "NodeResourcesFit",
"Weight": 0
},
{
"Name": "NodePorts",
"Weight": 0
},
{
"Name": "PodTopologySpread",
"Weight": 0
},
{
"Name": "InterPodAffinity",
"Weight": 0
},
{
"Name": "VolumeBinding",
"Weight": 0
}
]
},
"PreScore": {
"Disabled": null,
"Enabled": [
{
"Name": "InterPodAffinity",
"Weight": 0
},
{
"Name": "PodTopologySpread",
"Weight": 0
},
{
"Name": "TaintToleration",
"Weight": 0
}
]
},
"QueueSort": {
"Disabled": null,
"Enabled": [
{
"Name": "PrioritySort",
"Weight": 0
}
]
},
"Reserve": {
"Disabled": null,
"Enabled": [
{
"Name": "VolumeBinding",
"Weight": 0
}
]
},
"Score": {
"Disabled": null,
"Enabled": [
{
"Name": "NodeResourcesBalancedAllocation",
"Weight": 1
},
{
"Name": "ImageLocality",
"Weight": 1
},
{
"Name": "InterPodAffinity",
"Weight": 1
},
{
"Name": "NodeResourcesLeastAllocated",
"Weight": 1
},
{
"Name": "NodeAffinity",
"Weight": 1
},
{
"Name": "NodePreferAvoidPods",
"Weight": 10000
},
{
"Name": "PodTopologySpread",
"Weight": 2
},
{
"Name": "TaintToleration",
"Weight": 1
}
]
}
},
"SchedulerName": "default-scheduler"
}
]
}
}
前置常识 k8s 的调度框架 scheduler framework
- 文档地位
- 这张架构图肯定要烂熟于心
咱们发现在 componentconfig.Profiles 其实次要分两块:
-
Plugins 段代表调度框架每个扩大点的插件开启和禁用的状况:
- 能够了解为在各个阶段:须要顺次执行开启列表中的插件,不执行禁用列表中的插件
- 比方在 Filter 阶段 (我节选了一部分) 都是咱们常常能看到的插件:去掉 NodeUnschedulable、过滤 NodeName、过滤 TaintToleration 还有依据 request 资源状况的 NodeResourcesFit
"Filter": {
"Disabled": null,
"Enabled": [
{
"Name": "NodeUnschedulable",
"Weight": 0
},
{
"Name": "NodeName",
"Weight": 0
},
{
"Name": "TaintToleration",
"Weight": 0
},
{
"Name": "NodeAffinity",
"Weight": 0
},
{
"Name": "NodePorts",
"Weight": 0
},
{
"Name": "NodeResourcesFit",
"Weight": 0
},
]
},
- PluginConfig 段代表各个插件的配置字段:这里是个 map 的 k - v 模式:很好了解就是每个插件的配置字段不一样,间接用 map 最不便
- 比方 NodeResourcesLeastAllocated 插件
{
"Args": {
"Resources": [
{
"Name": "cpu",
"Weight": 1
},
{
"Name": "memory",
"Weight": 1
}
]
},
"Name": "NodeResourcesLeastAllocated"
},
要点 6:对于插件权重的探讨
- 在下面的配置中能够看到很多 阶段如 Filter 和 Score 的插件中都有权重的配置
{
"Name": "NodePreferAvoidPods",
"Weight": 10000
},
- 然而只有在 Score 中的插件的 Weight 值才大于 0
- 其余阶段的插件的 Weight 值都是 0
联合代码看看:找到 Plugin.Weight (1.22 分支)
- 地位 D:\go_path\src\github.com\kubernetes\kubernetes\pkg\scheduler\apis\config\types.go
// Plugin specifies a plugin name and its weight when applicable. Weight is used only for Score plugins.
type Plugin struct {
// Name defines the name of plugin
Name string
// Weight defines the weight of plugin, only used for Score plugins.
Weight int32
}
- 咱们发现 Plugin 构造体就是一个名字和权重
- 并且从正文中明确的看到了权重只有在 Score 阶段的插件中才有作用
小乙老师对于 k8s 调度器源码文章举荐
- k8s 默认调度器对于 pod 申请资源过滤的源码细节
- 从 k8s 集群 e2e 调度慢告警看 kube-scheduler 源码
那么基于实在负载调度的调度器该怎么编写呢 ?
正文完
发表至: kubernetes
2023-02-24