关于后端:Kubernetes-自动化诊断工具k8sgptoperator

35次阅读

共计 3572 个字符,预计需要花费 9 分钟才能阅读完成。

背景

在 Kubernetes 上,从部署 Deployment 到失常提供服务,整个流程可能会呈现各种各样问题,有趣味的能够浏览 Kubernetes Deployment 的故障排查可视化指南(2021 中文版)。从可视化指南也可能看出这些问题实际上都是有迹可循,依据错误信息根本很容易找到解决办法。随着 ChatGPT 的风行,基于 LLM 的文本生成我的项目不断涌现,k8sgpt 便是其中之一。

k8sgpt 是一个扫描 Kubernetes 集群、诊断和分类问题的工具。它将 SRE 教训编入其分析器,并通过 AI 帮忙提取并丰盛相干的信息。

其内置了大量的分析器:

  • podAnalyzer
  • pvcAnalyzer
  • rsAnalyzer
  • serviceAnalyzer
  • eventAnalyzer
  • ingressAnalyzer
  • statefulSetAnalyzer
  • deploymentAnalyzer
  • cronJobAnalyzer
  • nodeAnalyzer
  • hpaAnalyzer(可选)
  • pdbAnalyzer(可选)
  • networkPolicyAnalyzer(可选)

k8sgpt 的能力是通过 CLI 来提供的,通过 CLI 能够对集群中的谬误进行疾速的诊断。

k8sgpt analyze --explain --filter=Pod --namespace=default --output=json
{
  "status": "ProblemDetected",
  "problems": 1,
  "results": [
    {
      "kind": "Pod",
      "name": "default/test",
      "error": [
        {"Text": "Back-off pulling image \"flomesh/pipy2\"","Sensitive": []
        }
      ],
      "details": "The Kubernetes system is experiencing difficulty pulling the requested image named \"flomesh/pipy2\". \n\nThe solution may be to check that the image is correctly spelled or to verify that it exists in the specified container registry. Additionally, ensure that the networking infrastructure that connects the container registry and Kubernetes system is working properly. Finally, check if there are any access restrictions or credentials required to pull the image and ensure they are provided correctly.",
      "parentObject": "test"
    }
  ]
}

然而,每次进行诊断都要执行命令,有点繁琐且限度较多。我想大家想要的必定是可能监控到问题并主动诊断。这就有了明天要介绍的 k8sgpt-operator

介绍

简略来说 k8sgpt-operator 能够在集群中开启自动化的 k8sgpt。它提供了两个 CRD:K8sGPTResult。前者能够用来设置 k8sgpt 及其行为;而后者则是用来展现问题资源的诊断后果。

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: kube-system
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.2.7
  enableAI: true
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key

演示

试验环境应用 k3s 集群。

export INSTALL_K3S_VERSION=v1.23.8+k3s2
curl -sfL https://get.k3s.io | sh -s - --disable traefik --disable local-storage --disable servicelb --write-kubeconfig-mode 644 --write-kubeconfig ~/.kube/config

装置 k8sgpt-operator

helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
helm install release k8sgpt/k8sgpt-operator -n openai --create-namespace

装置实现后,能够看到随 operator 装置的两个 CRD:k8sgptsresults

kubectl api-resources | grep -i gpt
k8sgpts                                        core.k8sgpt.ai/v1alpha1                true         K8sGPT
results                                        core.k8sgpt.ai/v1alpha1                true         Result

在开始之前,须要学生成一个 OpenAI 的 key,并保留到 secret 中。

OPENAI_TOKEN=xxxx
kubectl create secret generic k8sgpt-sample-secret --from-literal=openai-api-key=$OPENAI_TOKEN -n openai

接下来创立 K8sGPT 资源。

kubectl apply -n openai -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
spec:
  model: gpt-3.5-turbo
  backend: openai
  noCache: false
  version: v0.2.7
  enableAI: true
  secret:
    name: k8sgpt-sample-secret
    key: openai-api-key
EOF

执行完下面的命令后在 openai 命名空间下会主动创立 Deployment k8sgpt-deployment

测试

应用一个不存在的镜像创立 pod。

kubectl run test --image flomesh/pipy2 -n default

而后在 openai 命名空间下会看到一个名为 defaulttest 的资源。

kubectl get result -n openai
NAME          AGE
defaulttest   5m7s

详细信息中能够看到诊断内容以及呈现问题的资源。

kubectl get result -n openai defaulttest -o yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
  creationTimestamp: "2023-05-02T09:00:32Z"
  generation: 1
  name: defaulttest
  namespace: openai
  resourceVersion: "1466"
  uid: 2ee27c26-61c1-4ef5-ae27-e1301a40cd56
spec:
  details: "The error message is indicating that Kubernetes is having trouble pulling
    the image \"flomesh/pipy2\" and is therefore backing off from trying to do so.
    \n\nThe solution to this issue would be to check that the image exists and that
    the spelling and syntax of the image name is correct. Additionally, check that
    the image is accessible from the Kubernetes cluster and that any required authentication
    or authorization is in place. If the issue persists, it may be necessary to troubleshoot
    the network connectivity between the Kubernetes cluster and the image repository."
  error:
  - text: Back-off pulling image "flomesh/pipy2"
  kind: Pod
  name: default/test
  parentObject: test

关注 ” 云原生指北 ” 公众号
(转载本站文章请注明作者和出处盛世浮生,请勿用于任何商业用途)

正文完
 0