镜像下载、域名解析、工夫同步请点击 阿里巴巴开源镜像站
实际环境
CentOS-7-x86_64-DVD-1810
Docker 19.03.9
Kubernetes version: v1.20.5
开始之前
1 台 Linux 操作或更多,兼容运行deb
,rpm
确保每台机器 2G 内存或以上
确保当控制面板的结点机,其 CPU 核数为双核或以上
确保集群中的所有机器网络互连
指标
- 装置一个
Kubernetes
集群控制面板 -
基于集群装置一个
Pod networ
以便集群之间能够互相通信装置领导
装置 Docker
装置过程略
留神,装置 docker
时,须要指 Kubenetes
反对的版本(参见如下),如果装置的 docker
版本过高导致,会提醒以下问题
WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
装置 docker 时指定版本
sudo yum install docker-ce-19.03.9 docker-ce-cli-19.03.9 containerd.io
如果没有装置 docke
r,运行kubeadm init
时会提醒以下问题
cannot automatically set CgroupDriver when starting the Kubelet: cannot execute 'docker info -f {{.CgroupDriver}}': executable file not found in $PATH
[preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec:"docker": executable file not found in $PATH
装置kubeadm
如果没有装置的话,先装置 kubeadm
,如果已装置,可通过apt-get update
&& apt-get upgrade
或yum update
命令更新 kubeadm
最新版
留神:更新 kubeadm
过程中,kubelet
每隔几秒中就会重启,这个是失常景象。
其它前置操作
敞开防火墙
# systemctl stop firewalld && systemctl disable firewalld
运行上述命令进行并禁用防火墙,否则运行 kubeadm init
时会提醒以下问题
[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
批改 /etc/docker/daemon.json
文件
编辑 /etc/docker/daemon.json
文件,增加以下内容
{"exec-opts":["native.cgroupdriver=systemd"]
}
而后执行 systemctl restart docker
命令重启docker
如果不执行以上操作,运行 kubeadm init
时会提醒以下问题
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
装置 socat
,conntrack
等依赖软件包
# yum install socat conntrack-tools
如果按未装置上述依赖包,运行 kubeadm init
时会提醒以下问题
[WARNING FileExisting-socat]: socat not found in system path
error execution phase preflight: [preflight] Some fatal errors occurred:`
[ERROR FileExisting-conntrack]: conntrack not found in system path`
设置 net.ipv4.ip_forward
值为 1
设置 net.ipv4.ip_forward
值为 1,具体如下
# sysctl -w net.ipv4.ip_forward=1
net.ipv4.ip_forward = 1
阐明:net.ipv4.ip_forward 如果为 0,则示意禁止转发数据包,为 1 则示意容许转发数据包,如果 net.ipv4.ip_forward 值不为 1,运行 kubeadm init 时会提醒以下问题
ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
以上配置长期失效,为了防止重启机器后生效,进行如下设置
# echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
留神:网上有举荐以下形式进行永恒配置的,然而笔者试过,理论不起作用
# echo "sysctl -w net.ipv4.ip_forward=1" >> /etc/rc.local
# chmod +x /etc/rc.d/rc.local
设置 net.bridge.bridge-nf-call-iptables 值为 1
做法参考 net.ipv4.ip_forward 设置
留神:上文操作,在每个集群结点都要施行一次
初始化控制面板结点
控制面板组件运行的机器,称之为控制面板结点,包含 etcd (集群数据库) 和 API Server (供 kubectl 命令行工具调用)
1、(举荐)如果打算降级单个控制面板 kubeadm 集群为高可用版(high availability),应该为 kubeadm init 指定 –control-plane-endpoint 参数选项以便为所有控制面板结点设置共享 endpont。该 endpont 能够是 DNS 名称或者本地负载平衡 IP 地址。
2、抉择一个网络插件,并确认该插件是否须要传递参数给 kubeadm init,这取决于你所选插件,比方应用 flannel,就必须为 kubeadm init 指定 –pod-network-cidr 参数选项
3、(可选)1.14 版本开始,kubeadm 会自动检测容器运行时,如果须要应用不同的容器运行时,或者有多于 1 个容器运行时的状况下,须要为 kubeadm init 指定 –cri-socket 参数选项
4、(可选)除非指定了其它的,kubeadm 应用与默认网关关联的网络接口为指定控制面板结点 API 服务器设置 advertise 地址。如果须要指定其它的网络接口,须要为 kubeadm init 指定 apiserver-advertise-address=<ip-address> 参数选项。公布 IPV6 Kubernetes 集群,须要为 kubeadm init 指定 –apiserver-advertise-address 参数选项,以设置 IPv6 地址,形如 –apiserver-advertise-address=fd00::101
5、(可选)运行 kubeadm init 之前,先运行 kubeadm config images pull,以确认可连贯到 gcr.io 容器镜像注册核心
如下,带参数运行 kubeadm init 以便初始化控制面板结点机,运行该命令时会先执行一系列的预检,以确保机器满足运行 kubernetes。如果预检发现错误,则主动退出程序,否则继续执行,下载并装置集群控制面板组件。这可能会破费几分钟
# kubeadm init --image-repository=registry.aliyuncs.com/google_containers --kubernetes-version stable --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost.localdomain] and IPs [10.96.0.1 10.118.80.93]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost localhost.localdomain] and IPs [10.118.80.93 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 89.062309 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.20" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the labels "node-role.kubernetes.io/master=''" and "node-role.kubernetes.io/control-plane='' (deprecated)"
[mark-control-plane] Marking the node localhost.localdomain as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 1sh85v.surdstc5dbrmp1s2
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo \
--discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
如上,命令输入 Your Kubernetes control-plane has initialized successfully! 及其它提醒,通知咱们初始化控制面板结点胜利。
留神:
1、如果不应用 –image-repository 选项指定阿里云镜像,可能会报相似如下谬误
failed to pull image "k8s.gcr.io/kube-apiserver:v1.20.5": output: Error response from daemon: Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
, error: exit status 1
2、因为应用 flannel 网络插件,必须指定 –pod-network-cidr 配置选项,否则名为 coredns-xxxxxxxxxx-xxxxx 的 Pod 无奈启动,始终处于 ContainerCreating 状态,查看详细信息,可见相似如下错误信息
networkPlugin cni failed to set up pod "coredns-7f89b7bc75-9vrrl_kube-system" network: open /run/flannel/subnet.env: no such file or directory
3、–pod-network-cidr 选项参数,即 Pod 网络不能和宿主主机网络雷同,否则装置 flannel 插件后会导致路由反复,进而导致 XShell 等工具无奈 ssh 宿主机,如下:
实际宿主主机网络 10.118.80.0/24,网卡接口 ens33
--pod-network-cidr=10.118.80.0/24
4、另外,须要特地留神的是,`--pod-network-cidr 的选项参数,必须和 kube-flannel.yml 文件中的 net-conf.json.Network 键值保持一致(本例中,键值如下所示,为 10.244.0.0/16,所以运行 kubeadm init 命令时,--pod-network-cidr 选项参数值设置为 10.244.0.0/16
)
# cat kube-flannel.yml|grep -E "^\s*\"Network""Network": "10.244.0.0/16",
首次实际时,设置 –pod-network-cidr=10.1.15.0/24,未修改 kube-flannel.yml 中 Network 键值,新退出集群的结点,无奈主动获取 pod cidr,如下
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system kube-flannel-ds-psts8 0/1 CrashLoopBackOff 62 15h
... 略
# kubectl -n kube-system logs kube-flannel-ds-psts8
... 略
E0325 01:03:08.190986 1 main.go:292] Error registering network: failed to acquire lease: node "k8snode1" pod cidr not assigned
W0325 01:03:08.192875 1 reflector.go:424] github.com/coreos/flannel/subnet/kube/kube.go:300: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
I0325 01:03:08.193782 1 main.go:371] Stopping shutdownHandler...
前面尝试批改 kube-flannel.yml 中`net-conf.json.Network 键值为 10.1.15.0/24 还是一样的提醒(先下载 kube-flannel.yml
,而后进行配置批改,再装置网络插件)
针对上述 node “xxxxxx” pod cidr not assigned 的问题,网上也有长期解决方案(笔者未验证),即为结点手动调配 podCIDR,命令如下:
kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}'
5、参照输入提醒,为了让非 root 用户也能够失常执行 kubectl,运行以下命令
# mkdir -p $HOME/.kube
# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# sudo chown $(id -u):$(id -g) $HOME/.kube/config
可选的,如果是 root 用户,可运行以下命令
export KUBECONFIG=/etc/kubernetes/admin.conf
记录 kubeadm init 输入中的 kubeadm join,前面须要用该命令增加结点到集群中
token 用于控制面板结点和退出集群的结点之间的互相认证。须要平安保留,因为任何领有该 token 的人都能够增加认证结点到集群中。可用 kubeadm token 展现,创立和删除该 token。命令详情参考 kubeadm reference guide.
装置 Pod 网络插件
必须基于 Pod 网络公布一个 Container Network Interface (CNI),以便 Pod 之间可互相通信。Pod 网络装置之前,不会启动 Cluster DNS (CoreDNS)
- 留神 Pod 网络不能和主机网络重叠,如果重叠,会出问题(如果发现网络发现网络插件的首选 Pod 网络与某些主机网络之间发生冲突,则应思考应用适合的 CIDR 块,而后在执行 kubeadm init 时,减少 –pod-network-cidr 选项替换网络插件 YAML 中的网络配置.
- 默认的, kubeadm 设置集群强制应用 RBAC (基于角色访问控制)。确保 Pod 网络插件及用其公布的任何清单反对 RBAC
- 如果让集群应用 IPv6–dual-stack,或者仅 single-stack IPv6 网络,确保往插件反对 IPv6. CNI v0.6.0 中增加了 IPv6 的反对。
好些我的项目应用 CNI 提供提供 Kubernetes 网络反对,其中一些也反对网络策略,以下是实现了 Kubernetes 网络模型的插件列表查看地址:
https://kubernetes.io/docs/co…
可在控制面板结点机上或者领有 kubeconfig 凭据的结点机上通过执行下述命令装置一个 Pod 网络插件,该插件间接以 daemonset 的形式装置,并且会把配置文件写入 /etc/cni/net.d 目录:
kubectl apply -f <add-on.yaml>
flannel 网络插件装置
手动公布 flannel(Kubernetes v1.17+)
# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
参考连贯:https://github.com/flannel-io…
每个集群只能装置一个 Pod 网络,Pod 网络装置实现后,可通过执行 kubectl get pods –all-namespaces 命令,查看命令输入中 coredns-xxxxxxxxxx-xxx Pod 是否处于 Running 来判断网络是否失常
查看 flannel 子网环境配置信息
# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
flannel 网络插件装置实现后,宿主机上会主动减少两个虚构网卡:cni0 和 flannel.1
# ifconfig -a
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.1 netmask 255.255.255.0 broadcast 10.244.0.255
inet6 fe80::705d:43ff:fed6:80c9 prefixlen 64 scopeid 0x20<link>
ether 72:5d:43:d6:80:c9 txqueuelen 1000 (Ethernet)
RX packets 312325 bytes 37811297 (36.0 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 356346 bytes 206539626 (196.9 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
inet6 fe80::42:e1ff:fec3:8b6a prefixlen 64 scopeid 0x20<link>
ether 02:42:e1:c3:8b:6a txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3 bytes 266 (266.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.118.80.93 netmask 255.255.255.0 broadcast 10.118.80.255
inet6 fe80::6ff9:dbee:6b27:1315 prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:d3:3b:ef txqueuelen 1000 (Ethernet)
RX packets 2092903 bytes 1103282695 (1.0 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 969483 bytes 253273828 (241.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.0.0 netmask 255.255.255.255 broadcast 10.244.0.0
inet6 fe80::a49a:2ff:fe38:3e4b prefixlen 64 scopeid 0x20<link>
ether a6:9a:02:38:3e:4b txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 30393748 bytes 5921348235 (5.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 30393748 bytes 5921348235 (5.5 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
从新初始化控制面板结点
实际过程中因选项配置不对,在网络插件装置后才发现须要,须要从新执行 kubeadm init 命令。具体实际操作如下:
# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "localhost.localdomain" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
#
执行完上述命令后,须要从新执行 初始化控制面板结点操作,并且重新安装网络插件
遇到的问题总结
从新执行 kubeadm init 命令后,执行 kubectl get pods –all-namespaces 查看 Pod 状态,发现 coredns-xxxxxxxxxx-xxxxxx 状态为 ContainerCreating,如下
# kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-7f89b7bc75-pxvdx 0/1 ContainerCreating 0 8m33s
kube-system coredns-7f89b7bc75-v4p57 0/1 ContainerCreating 0 8m33s
kube-system etcd-localhost.localdomain 1/1 Running 0 8m49s
... 略
执行 kubectl describe pod coredns-7f89b7bc75-pxvdx -n kube-system 命令查看对应 Pod 详细信息,发现如下谬误:
Warning FailedCreatePodSandBox 98s (x4 over 103s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "04434c63cdf067e698a8a927ba18e5013d2a1a21afa642b3cddedd4ff4592178" network for pod "coredns-7f89b7bc75-pxvdx": networkPlugin cni failed to set up pod "coredns-7f89b7bc75-pxvdx_kube-system" network: failed to set bridge addr: "cni0" already has an IP address different from 10.1.15.1/24
如下,查看网卡信息,发现 cni0 已调配了 IP 地址(网络插件上次调配的),导致本次网络插件给它设置 IP 失败。
# ifconfig -a
cni0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 10.118.80.1 netmask 255.255.255.0 broadcast 10.118.80.255
inet6 fe80::482d:65ff:fea6:32fd prefixlen 64 scopeid 0x20<link>
ether 4a:2d:65:a6:32:fd txqueuelen 1000 (Ethernet)
RX packets 267800 bytes 16035849 (15.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 116238 bytes 10285959 (9.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
... 略
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.1.15.0 netmask 255.255.255.255 broadcast 10.1.15.0
inet6 fe80::a49a:2ff:fe38:3e4b prefixlen 64 scopeid 0x20<link>
ether a6:9a:02:38:3e:4b txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
... 略
解决办法如下,删除配置谬误的 cni0 网卡,删除网卡后会主动重建,而后就好了
$ sudo ifconfig cni0 down
$ sudo ip link delete cni0
控制面板结点 Toleration(可选)
默认的,出于平安思考,集群不会在控制面板结点机上调度(schedule)Pod。如果心愿在控制面板结点机上调度 Pod,比方用于开发的单机 Kubernetes 集群,须要运行以下命令
kubectl taint nodes --all node-role.kubernetes.io/master- # 移除所有 Labels 以 node-role.kubernetes.io/master 打头的结点的污点(Taints)
实际如下
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready control-plane,master 63m v1.20.5
# kubectl taint nodes --all node-role.kubernetes.io/master-
node/localhost.localdomain untainted
增加结点到集群
批改新结点的 hostname
# hostname
localhost.localdomain
# hostname k8sNode1
以上通过命令批改主机名仅长期失效,为了防止重启生效,须要编辑 /etc/hostname 文件,替换默认的 localhost.localdomain 为指标名称(例中为 k8sNode),如果不增加,后续操作会遇到一下谬误
[WARNING Hostname]: hostname "k8sNode1" could not be reached
[WARNING Hostname]: hostname "k8sNode1": lookup k8sNode1 on 223.5.5.5:53: read udp 10.118.80.94:33293->223.5.5.5:53: i/o timeout
批改 /ect/hosts 配置,减少结点机 hostname 到结点机 IP(例中为 10.118.80.94)的映射,如下
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.118.80.94 k8sNode1
ssh 登录指标结点机,切换至 root 用户(如果非 root 用户登录),而后运行控制面板机器上执行 kubeadm init 命令输入的 kubeadm join 命令,录入:
kubeadm join --token <token> <control-plane-host>:<control-plane-port> --discovery-token-ca-cert-hash sha256:<hash>
可在控制面板机上通过运行一下命令查看已有且未过期 token
# kubeadm token list
如果没有 token,可在控制面板机上通过以下命令从新生成 token
# kubeadm token create
实际如下
# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
控制面板节点机即 master 机器上查看是否新增结点
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8snode1 NotReady <none> 74s v1.20.5
localhost.localdomain Ready control-plane,master 7h24m v1.20.5
如上,新增了一个 k8snode1 结点
遇到问题总结
问题 1:运行]kubeadm join 时报错,如下
# kubeadm join 10.118.80.93:6443 --token ap4vvq.8xxcc0uea7dxbjlo --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID"ap4vvq"
To see the stack trace of this error execute with --v=5 or higher
解决办法:
token 过期,运行 kubeadm token create 命令从新生成 token
问题 1:运行]kubeadm join 时报错,如下
# kubeadm join 10.118.80.93:6443 --token pa0gxw.4vx2wud1e7e0rzbx --discovery-token-ca-cert-hash sha256:c4493c04d789463ecd25c97453611a9dfacb36f4d14d5067464832b9e9c5039a
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: cluster CA found in cluster-info ConfigMap is invalid: none of the public keys"sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f" are pinned
To see the stack trace of this error execute with --v=5 or higher
解决办法:
discovery-token-ca-cert-hash 生效,运行以下命令,从新获取 discovery-token-ca-cert-hash 值
# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f
应用输入的 hash 值
--discovery-token-ca-cert-hash sha256:8e2f94e2f4f1b66c45d941c0a7f72e328c242346360751b5c1cf88f437ab854f
问题 2:cni config uninitialized 谬误问题
通过 k8s 自带 UI 查看新退出结点状态为 KubeletNotReady,提示信息如下,
[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful, runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized, CSINode is not yet initialized, missing node capacity for resources: ephemeral-storage]
解决办法:重新安装 CNI 网络插件(实际时采纳了虚拟机,可能是因为过后应用的快照没蕴含网络插件),而后从新清理结点,最初再重新加入结点
# CNI_VERSION="v0.8.2"
# mkdir -p /opt/cni/bin
# curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-linux-amd64-${CNI_VERSION}.tgz" | sudo tar -C /opt/cni/bin -xz
清理
如果在集群中应用一次性服务器进行测试,则能够间接敞开这些服务器,不须要进行进一步的清理。能够应用 kubectl config delete cluster 删除对集群的本地援用(笔者未试过)。
然而,如果您想更洁净地清理集群,则应该首先清空结点数据,确保节点数据被清空,而后再删除结点
移除结点
控制面板结点机上的操作
先在控制面板结点机上运行以下命令,通知控制面板结点机器强制删除待删除结点数据
kubectl drain <node name> --delete-emptydir-data --force --ignore-daemonsets
实际如下:
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8snode1 Ready <none> 82m v1.20.5
localhost.localdomain Ready control-plane,master 24h v1.20.5
# kubectl drain k8snode1 --delete-emptydir-data --force --ignore-daemonsets
node/k8snode1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-4xqcc, kube-system/kube-proxy-c7qzs
evicting pod default/nginx-deployment-64859b8dcc-v5tcl
evicting pod default/nginx-deployment-64859b8dcc-qjrld
evicting pod default/nginx-deployment-64859b8dcc-rcvc8
pod/nginx-deployment-64859b8dcc-rcvc8 evicted
pod/nginx-deployment-64859b8dcc-qjrld evicted
pod/nginx-deployment-64859b8dcc-v5tcl evicted
node/k8snode1 evicted
# kubectl get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready control-plane,master 24h v1.20.5
指标结点机上的操作
登录到指标结点机上,执行以下命令
# kubeadm reset
上述命令不会重置、清理 iptables、IPVS 表,如果须要重置 iptables 还须要手动运行以下命令:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
如果须要重置 IPVS,必须运行以下命令。
ipvsadm -C
留神:如果无非凡需要,不要去重置网络
删除结点配置文件
# rm -rf /etc/cni/net.d
# rm -f $HOME/.kube/config
控制面板结点机上的操作
通过执行命令删除结点 kubectl delete node <node name>
### 删除未删除的 pod
# kubectl delete pod kube-flannel-ds-4xqcc -n kube-system --force
# kubectl delete pod kube-proxy-c7qzs -n kube-system --force
# kubectl delete node k8snode1
node "k8snode1" deleted
删除后,如果须要重新加入结点,可通过 kubeadm join 携带适当参数运行退出
清理控制面板
能够在控制面板结点机上,应用 kubeadm reset 命令。点击查看 kubeadm reset 命令参考
本文转自:https://www.cnblogs.com/shouk…