试验环境阐明
云端环境:
- OS: Ubuntu Server 20.04.1 LTS 64bit
- Kubernetes: v1.19.8
- 网络插件:calico v3.16.3
- Cloudcore: kubeedge/cloudcore:v1.6.1
边缘环境:
- OS: Ubuntu Server 18.04.5 LTS 64bit
- EdgeCore: v1.19.3-kubeedge-v1.6.1
-
docker:
- version: 20.10.7
- cgroupDriver: systemd
边缘端注册 QuikStart:
参考资料:
https://docs.kubeedge.io/en/d…
https://docs.kubeedge.io/en/d…
-
从 cloudcore 获取 token
kubectl get secret -nkubeedge tokensecret -o=jsonpath='{.data.tokendata}' | base64 -d
-
配置 edgecore
如果应用 二进制装置,须要先获取初始的最小化 edgecore 配置文件:edgecore --minconfig > edgecore.yaml
该配置文件适宜刚开始应用 KubeEdge 的同学,算是最精简的配置。
批改其中要害配置(这里仅列出要害配置):…… modules: edgeHub: …… httpServer: https://cloudcore 侧 HttpServer 监听地址: 端口(默认为 10002)token: 第一步中获取的 token 字符串 websocket: …… server: cloudcore 侧监听地址: 端口(默认为 10000)…… edged: cgroupDriver: systemd // 和 docker 所用 native.cgroupdriver 保持一致 …… hostnameOverride: edge01 // 设置该节点注册到 cloudcore 的名称 nodeIP: 指定该节点 IP 地址 // 默认会取为本机 IP 地址,多网卡留神查看 …… eventBus: mqttMode: 0 // 应用 internal 的 mqtt 服务 ……
如果应用keadm 装置部署,执行:
keadm join --cloudcore-ipport=cloudcore 监听的 IP 地址: 端口(默认为 10002)--token= 获取到的 token 字符串
执行后,edgecore 节点会自行应用 systemctl 进行治理,并退出开机启动项,同时启动 edgecore 节点,此时 edgecore 节点的运行状态不肯定失常。
同样,批改并查看配置文件,配置文件主动生成于/etc/kubeedge/config/edgecore.yaml
-
启动 edgecore 服务
如采纳 二进制装置,则:nohup ./edgecore --config edgecore.yaml 2>&1 > edgecore.log &
如采纳 keadm 装置,则:
systemctl restart edgecore
-
验证接入
于节点上云端 master 节点上执行:root@master01:/home/ubuntu# kubectl get nodes NAME STATUS ROLES AGE VERSION edge01 Ready agent,edge 10h v1.19.3-kubeedge-v1.6.1 master01 Ready master 53d v1.19.8 master02 Ready master 53d v1.19.8 master03 Ready master 53d v1.19.8 node01 Ready worker 53d v1.19.8 node02 Ready worker 53d v1.19.8
我 cloudcore 开了主动注册,此时可见 edge 节点曾经注册上了。
排坑过程
然而查看边缘节点运行的 pod 时发现边缘节点主动起了 calico,kube-proxy,nodelocaldns 的 pod:
root@master01:/home/ubuntu# kubectl get pod -A -o wide | grep edge01
kube-system calico-node-l2h8l 0/1 Init:Error 2 52s 172.31.100.15 edge01 <none> <none>
kube-system kube-proxy-m6rbk 1/1 Running 0 2m22s 172.31.100.15 edge01 <none> <none>
kube-system nodelocaldns-hr7fk 0/1 Error 2 30s 172.31.100.15 edge01 <none> <none>
其中:
- calico 初始化呈现 Error 谬误
- nodelocaldns 呈现 Error,起因: ContainersNotReady,
- kubeproxy 部署胜利了
Note: 网上有其余文章说网络插件部署不胜利暂不影响 edge 节点的应用,在本文测试环境中,实际上是影响应用的,我测试下发了一个 deployment 部署 nginx,其 Pod 始终处于 Pending 状态
-
起因剖析:
- calico 初始化谬误,查了 2020 年 12 月的一个 issues,说是 CNI 反对还在开发中,临时不反对。
- 待查,预估是兼容性或者网络插件起因
- edge 节点上是不能运行 kubeproxy 的,如果装置有 kubeproxy,在启动 edgecore 时日志会呈现报错
Failed to check the running environment: Kube-proxy should not running on edge node when running edgecore
https://github.com/kubeedge/k…
-
查看发现,这几个 pod 是应用 daemonset 部署的:
root@master01:/home/ubuntu# kubectl get daemonset -A NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-system calico-node 5 5 5 5 5 kubernetes.io/os=linux 53d kube-system kube-proxy 5 5 5 5 5 kubernetes.io/os=linux 53d kube-system nodelocaldns 5 5 5 5 5 <none> 53d
-
批改其 yaml 文件:
kubectl edit daemonset -n kube-system calico-node kubectl edit daemonset -n kube-system kube-proxy kubectl edit daemonset -n kube-system nodelocaldns
新增亲和性配置(affinity):
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/edge operator: DoesNotExist
-
进行 edgecore 服务
root@edge01:/usr/local/edge# systemctl stop edgecore
-
从 k8s 集群中清理该 edge 节点
root@master01:/home/ubuntu# kubectl drain edge01 --delete-local-data --force --ignore-daemonsets root@master01:/home/ubuntu# kubectl delete node edge01
-
重启 edge 节点上 docker 服务
root@edge01:/usr/local/edge# systemctl restart docker
-
重启 edgecore
root@edge01:/usr/local/edge# systemctl start edgecore
此时,edge 节点从新注册胜利,且 edge 节点未运行任何 pod
root@master01:/home/ubuntu# kubectl get nodes NAME STATUS ROLES AGE VERSION edge01 Ready agent,edge 8m3s v1.19.3-kubeedge-v1.6.1 master01 Ready master 53d v1.19.8 master02 Ready master 53d v1.19.8 master03 Ready master 53d v1.19.8 node01 Ready worker 53d v1.19.8 node02 Ready worker 53d v1.19.8 root@master01:/home/ubuntu# kubectl get pod -A -o wide | grep edge01 root@master01:/home/ubuntu#
Note:
同理,不须要运行在 edge 节点上的 resource,也须要配置其亲和性。新增 resource 时(特地时 daemonset 和 cronjob),留神抉择运行节点,否则会导致 pod 报错或restartPolicy
为Always
的 Pod 一直重启。除了手动批改外,可应用以下脚本进行操作(我没有进行验证, 集体感觉依据 resource 类型写脚本好些,改了什么本人心里有个底):
https://github.com/kubesphere…#!/bin/bash NodeSelectorPatchJson='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/master":"","node-role.kubernetes.io/worker": ""}}}}}' NoShedulePatchJson='{"spec":{"template":{"spec":{"affinity":{"nodeAffinity":{"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{"matchExpressions":[{"key":"node-role.kubernetes.io/edge","operator":"DoesNotExist"}]}]}}}}}}}' edgenode="edgenode" if [$1]; then edgenode="$1" fi namespaces=($(kubectl get pods -A -o wide |egrep -i $edgenode | awk '{print $1}' )) pods=($(kubectl get pods -A -o wide |egrep -i $edgenode | awk '{print $2}' )) length=${#namespaces[@]} for((i=0;i<$length;i++)); do ns=${namespaces[$i]} pod=${pods[$i]} resources=$(kubectl -n $ns describe pod $pod | grep "Controlled By" |awk '{print $3}') echo "Patching for ns:"${namespaces[$i]}",resources:"$resources kubectl -n $ns patch $resources --type merge --patch "$NoShedulePatchJson" sleep 1 done
尝试在 edge 节点进行部署
- 编辑 deployment,部署 nginx 进行测试
kind: Deployment apiVersion: apps/v1 metadata: name: nginx-edge namespace: test-ns labels: app: nginx-edge annotations: deployment.kubernetes.io/revision: '1' spec: replicas: 1 selector: matchLabels: app: nginx-edge template: metadata: creationTimestamp: null labels: app: nginx-edge spec: containers: - name: nginx-edge01 image: 'nginx:latest' ports: - name: tcp-80 containerPort: 80 protocol: TCP resources: limits: cpu: 300m memory: 200Mi requests: cpu: 100m memory: 10Mi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: IfNotPresent restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst nodeSelector: kubernetes.io/hostname: edge01 serviceAccountName: default serviceAccount: default securityContext: {} affinity: {} schedulerName: default-scheduler strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 25% maxSurge: 25% revisionHistoryLimit: 10 progressDeadlineSeconds: 600
- 查看部署的 nginx
root@master01:/home/ubuntu# kubectl get pod -A -o wide | grep edge01 test-ns nginx-edge-946d96f44-n2h8v 1/1 Running 0 40s 172.17.0.2 edge01 <none> <none>
这时边缘侧部署 nginx 胜利。