共计 7894 个字符,预计需要花费 20 分钟才能阅读完成。
写这篇文章是来填 很久之前挖下的坑。
本文波及组件的源码版本如下:
- Kubernetes 1.24
- CRI 0.25.0
- Containerd 1.6
容器运行时(Container Runtime)是负责管理和执行容器的组件。它负责将容器镜像转化为在主机上运行的理论容器过程,提供镜像治理、容器的生命周期治理、资源隔离、文件系统、网络配置等性能。
常见容器运行时有上面这几种,这些容器运行时都提供了不同水平的性能和性能。但他们都遵循容器运行时接口(CRI),以便可能与 Kubernetes 或其余容器编排系统集成,实现容器的调度和治理。
- containerd
- CRI-O
- Docker Engine
- Mirantis Container Runtime
有了 CRI,咱们也能够“随便”地在几种容器运行时之间进行切换,而无需从新编译 Kubernetes。简略来讲,CRI 定义了所有对容器的操作,作为容器编排零碎与容器运行工夫的标准接口存在。
CRI 的前生今世
CRI 的首次引入是在 Kubernets 1.5,初始版本是 v1alpha1
。在这之前,Kubernetes 须要在 kubelet 源码中保护对各个容器运行时的反对。
有了 CRI 之后,在 kubelet 中仅需反对 CRI 即可,而后通过一个中间层 CRI shim(grpc 服务器)与容器运行时进行交互。因为此时各家容器运行时实现还未反对 CRI。
在去年公布的 Kubernetes 1.24 中,正式移除了 Dockershim,与容易运行时的交互失去了简化。
Kubernetes 目前反对 CRI 的
v1alpha2
和v1
。其中v1
版本是在 Kubernetes 1.23 版本中引入的。每次 kubelet 启动时,首先会尝试应用
v1
的 API 与容器运行时进行连贯。如果失败,才会尝试应用v1alpha2
。
kubelet 与 CRI
在之前做过的 kubelet 源码剖析 中曾提到 Kubelet#syncLoop()
会继续监控来自 文件 、apiserver、http 的变更,来更新 pod 的状态。写那篇文章的时候,剖析到这里就完结了。因为这之后的工作就交给 容器运行时 来实现 _sandbox_ 和各种容器的创立和运行,见 kubeGenericRuntimeManager#SyncPod()
。
kubelet 启动时便会 初始化 CRI 客户端,与容器运行时建设连贯并确认 CRI 的版本。
创立 pod 的过程中,都会通过 CRI 与容器运行时进行交互:
- 创立 sandbox
- 创立容器
- 拉取镜像
参考源码
- pkg/kubelet/kuberuntime/kuberuntime_sandbox.go#L39
- pkg/kubelet/kuberuntime/kuberuntime_container.go#L176
- pkg/kubelet/images/image_manager.go#L89
接下来咱们以 Containerd 为例,看下如何解决 kubelet 的申请。
Containerd 与 CRI
Containerd 的 criService
实现了 CRI 接口 RuntimeService
和 ImageService
的 RuntimeServiceServer
和 ImageServiceServer
。
cirService
会进一步包装成instrumentedService
,保障所有的操作都是在k8s.io
命名空间下执行的
RuntimeServiceServer
RuntimeServiceServer
type RuntimeServiceServer interface {
// Version returns the runtime name, runtime version, and runtime API version.
Version(context.Context, *VersionRequest) (*VersionResponse, error)
// RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
// the sandbox is in the ready state on success.
RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
// StopPodSandbox stops any running process that is part of the sandbox and
// reclaims network resources (e.g., IP addresses) allocated to the sandbox.
// If there are any running containers in the sandbox, they must be forcibly
// terminated.
// This call is idempotent, and must not return an error if all relevant
// resources have already been reclaimed. kubelet will call StopPodSandbox
// at least once before calling RemovePodSandbox. It will also attempt to
// reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
// multiple StopPodSandbox calls are expected.
StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
// RemovePodSandbox removes the sandbox. If there are any running containers
// in the sandbox, they must be forcibly terminated and removed.
// This call is idempotent, and must not return an error if the sandbox has
// already been removed.
RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
// PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
// present, returns an error.
PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
// ListPodSandbox returns a list of PodSandboxes.
ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
// CreateContainer creates a new container in specified PodSandbox
CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
// StartContainer starts the container.
StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
// StopContainer stops a running container with a grace period (i.e., timeout).
// This call is idempotent, and must not return an error if the container has
// already been stopped.
// The runtime must forcibly kill the container after the grace period is
// reached.
StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
// RemoveContainer removes the container. If the container is running, the
// container must be forcibly removed.
// This call is idempotent, and must not return an error if the container has
// already been removed.
RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
// ListContainers lists all containers by filters.
ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
// ContainerStatus returns status of the container. If the container is not
// present, returns an error.
ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
// UpdateContainerResources updates ContainerConfig of the container synchronously.
// If runtime fails to transactionally update the requested resources, an error is returned.
UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
// ReopenContainerLog asks runtime to reopen the stdout/stderr log file
// for the container. This is often called after the log file has been
// rotated. If the container is not running, container runtime can choose
// to either create a new log file and return nil, or return an error.
// Once it returns error, new container log file MUST NOT be created.
ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
// ExecSync runs a command in a container synchronously.
ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
// Exec prepares a streaming endpoint to execute a command in the container.
Exec(context.Context, *ExecRequest) (*ExecResponse, error)
// Attach prepares a streaming endpoint to attach to a running container.
Attach(context.Context, *AttachRequest) (*AttachResponse, error)
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
// ContainerStats returns stats of the container. If the container does not
// exist, the call returns an error.
ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
// ListContainerStats returns stats of all running containers.
ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
// PodSandboxStats returns stats of the pod sandbox. If the pod sandbox does not
// exist, the call returns an error.
PodSandboxStats(context.Context, *PodSandboxStatsRequest) (*PodSandboxStatsResponse, error)
// ListPodSandboxStats returns stats of the pod sandboxes matching a filter.
ListPodSandboxStats(context.Context, *ListPodSandboxStatsRequest) (*ListPodSandboxStatsResponse, error)
// UpdateRuntimeConfig updates the runtime configuration based on the given request.
UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
// Status returns the status of the runtime.
Status(context.Context, *StatusRequest) (*StatusResponse, error)
// CheckpointContainer checkpoints a container
CheckpointContainer(context.Context, *CheckpointContainerRequest) (*CheckpointContainerResponse, error)
// GetContainerEvents gets container events from the CRI runtime
GetContainerEvents(*GetEventsRequest, RuntimeService_GetContainerEventsServer) error
}
ImageServiceServer
ImageServiceServer
type ImageServiceServer interface {// ListImages lists existing images. ListImages(context.Context, *ListImagesRequest) (*ListImagesResponse, error)
// ImageStatus returns the status of the image. If the image is not // present, returns a response with ImageStatusResponse.Image set to // nil. ImageStatus(context.Context, *ImageStatusRequest) (*ImageStatusResponse, error)
// PullImage pulls an image with authentication config. PullImage(context.Context, *PullImageRequest) (*PullImageResponse, error)
// RemoveImage removes the image. // This call is idempotent, and must not return an error if the image has // already been removed. RemoveImage(context.Context, *RemoveImageRequest) (*RemoveImageResponse, error)
// ImageFSInfo returns information of the filesystem that is used to store images.
ImageFsInfo(context.Context, *ImageFsInfoRequest) (*ImageFsInfoResponse, error)
}
上面以创立 sandbox 为例看一下 Containerd 的源码。
Containerd 源码剖析
创立 sandbox 容器的申请通过 CRI 的 UDS(Unix domain socket)接口 /runtime.v1.RuntimeService/RunPodSandbox
,进入到 criService
的解决流程中。在 criService#RunPodSandbox()
,负责创立和运行 sandbox 容器,并保障容器状态失常。
- 下载 sandobx 容器镜像
- 初始化容器元数据
- 初始化 pod 网络命名空间,具体内容可参考之前的文章 源码解析:从 kubelet、容器运行时看 CNI 的应用
- 更新容器元数据
- 写入文件系统
参考源码
- pkg/cri/server/sandbox_run.go#L61
- services/tasks/local.go#L156
总结
CRI 提供了一种标准化的接口,用于与底层容器运行时进行交互。这对与倒退和状大 Kubernetes 生态系统十分重要:
- Kubernetes 管制立体与容器治理的具体实现解耦,能够独立降级或者切换容器运行时,不便扩大和优化。
- Kubernetes 作为一个跨云、跨平台和多环境的容器编排零碎,在不同的环境和场景下应用不同的容器平台。CRI 的呈现,保障平台的多样性和灵活性。
关注 ” 云原生指北 ” 公众号
(转载本站文章请注明作者和出处盛世浮生,请勿用于任何商业用途)