背景

容器封装了应用程序的依赖项,以提供可反复和牢靠的应用程序和服务执行,而无需整个虚拟机的开销。如果您已经花了一天的工夫为一个迷信或 深度学习 应用程序提供一个蕴含大量软件包的服务器,或者曾经破费数周的工夫来确保您的应用程序能够在多个 linux 环境中构建和部署,那么 Docker 容器十分值得您破费工夫。

装置增加docker源

[root@localhost ~]# sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repoLoaded plugins: fastestmirror, langpacksadding repo from: https://download.docker.com/linux/centos/docker-ce.repograbbing file https://download.docker.com/linux/centos/docker-ce.repo to /etc/yum.repos.d/docker-ce.reporepo saved to /etc/yum.repos.d/docker-ce.repo[root@localhost ~]#[root@localhost ~]# cat /etc/yum.repos.d/docker-ce.repo[docker-ce-stable]name=Docker CE Stable - $basearchbaseurl=https://download.docker.com/linux/centos/$releasever/$basearch/stableenabled=1gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-stable-debuginfo]name=Docker CE Stable - Debuginfo $basearchbaseurl=https://download.docker.com/linux/centos/$releasever/debug-$basearch/stableenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-stable-source]name=Docker CE Stable - Sourcesbaseurl=https://download.docker.com/linux/centos/$releasever/source/stableenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-test]name=Docker CE Test - $basearchbaseurl=https://download.docker.com/linux/centos/$releasever/$basearch/testenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-test-debuginfo]name=Docker CE Test - Debuginfo $basearchbaseurl=https://download.docker.com/linux/centos/$releasever/debug-$basearch/testenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-test-source]name=Docker CE Test - Sourcesbaseurl=https://download.docker.com/linux/centos/$releasever/source/testenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-nightly]name=Docker CE Nightly - $basearchbaseurl=https://download.docker.com/linux/centos/$releasever/$basearch/nightlyenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-nightly-debuginfo]name=Docker CE Nightly - Debuginfo $basearchbaseurl=https://download.docker.com/linux/centos/$releasever/debug-$basearch/nightlyenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[docker-ce-nightly-source]name=Docker CE Nightly - Sourcesbaseurl=https://download.docker.com/linux/centos/$releasever/source/nightlyenabled=0gpgcheck=1gpgkey=https://download.docker.com/linux/centos/gpg[root@localhost ~]#

下载安装包

[root@localhost ~]# cd docker[root@localhost docker]#[root@localhost docker]# repotrack docker-ce

装置docker 并设置开机自启

[root@localhost docker]# yum install ./*[root@localhost docker]# systemctl  start docker[root@localhost docker]#[root@localhost docker]# systemctl  enable dockerCreated symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.[root@localhost docker]#

配置nvidia-docker的源

[root@localhost docker]# distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \>    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo[root@localhost docker]# cat /etc/yum.repos.d/nvidia-docker.repo[libnvidia-container]name=libnvidia-containerbaseurl=https://nvidia.github.io/libnvidia-container/stable/centos7/$basearchrepo_gpgcheck=1gpgcheck=0enabled=1gpgkey=https://nvidia.github.io/libnvidia-container/gpgkeysslverify=1sslcacert=/etc/pki/tls/certs/ca-bundle.crt[libnvidia-container-experimental]name=libnvidia-container-experimentalbaseurl=https://nvidia.github.io/libnvidia-container/experimental/centos7/$basearchrepo_gpgcheck=1gpgcheck=0enabled=0gpgkey=https://nvidia.github.io/libnvidia-container/gpgkeysslverify=1sslcacert=/etc/pki/tls/certs/ca-bundle.crt[nvidia-container-runtime]name=nvidia-container-runtimebaseurl=https://nvidia.github.io/nvidia-container-runtime/stable/centos7/$basearchrepo_gpgcheck=1gpgcheck=0enabled=1gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkeysslverify=1sslcacert=/etc/pki/tls/certs/ca-bundle.crt[nvidia-container-runtime-experimental]name=nvidia-container-runtime-experimentalbaseurl=https://nvidia.github.io/nvidia-container-runtime/experimental/centos7/$basearchrepo_gpgcheck=1gpgcheck=0enabled=0gpgkey=https://nvidia.github.io/nvidia-container-runtime/gpgkeysslverify=1sslcacert=/etc/pki/tls/certs/ca-bundle.crt[nvidia-docker]name=nvidia-dockerbaseurl=https://nvidia.github.io/nvidia-docker/centos7/$basearchrepo_gpgcheck=1gpgcheck=0enabled=1gpgkey=https://nvidia.github.io/nvidia-docker/gpgkeysslverify=1sslcacert=/etc/pki/tls/certs/ca-bundle.crt[root@localhost docker]#

装置下载nvidia-docker

[root@localhost ~]# mkdir nvidia-docker2[root@localhost ~]# cd nvidia-docker2[root@localhost nvidia-docker2]# yum update -y[root@localhost nvidia-docker2]# repotrack nvidia-docker2[root@localhost nvidia-docker2]# yum install ./*[root@localhost ~]# mkdir nvidia-container-toolkit[root@localhost ~]# cd nvidia-container-toolkit[root@localhost nvidia-container-toolkit]# repotrack nvidia-container-toolkit[root@ai-rd nvidia-container-toolkit]# yum install ./*

下载镜像,并保留

[root@localhost ~]# docker pull nvidia/cuda:11.0-base11.0-base: Pulling from nvidia/cuda54ee1f796a1e: Pull completef7bfea53ad12: Pull complete46d371e02073: Pull completeb66c17bbf772: Pull complete3642f1a6dfb3: Pull completee5ce55b8b4b9: Pull complete155bc0332b0a: Pull completeDigest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67aStatus: Downloaded newer image for nvidia/cuda:11.0-basedocker.io/nvidia/cuda:11.0-base[root@localhost ~]#[root@localhost ~]# docker imagesREPOSITORY    TAG         IMAGE ID       CREATED         SIZEnvidia/cuda   11.0-base   2ec708416bb8   15 months ago   122MB[root@localhost ~]#[root@localhost ~]# docker save -o cuda-11.0.tar nvidia/cuda:11.0-base[root@localhost ~]#[root@localhost ~]# ls cuda-11.0.tarcuda-11.0.tar[root@localhost ~]#

在要测试的服务器上导入镜像

[root@ai-rd cby]# docker load -i cuda-11.0.tar2ce3c188c38d: Loading layer [==================================================>]  75.23MB/75.23MBad44aa179b33: Loading layer [==================================================>]  1.011MB/1.011MB35a91a75d24b: Loading layer [==================================================>]  15.36kB/15.36kBa4399aeb9a0e: Loading layer [==================================================>]  3.072kB/3.072kBfa39d0e9f3dc: Loading layer [==================================================>]  18.84MB/18.84MB232fb43df6ad: Loading layer [==================================================>]  30.08MB/30.08MB0da51e35db05: Loading layer [==================================================>]  22.53kB/22.53kBLoaded image: nvidia/cuda:11.0-base[root@ai-rd cby]#[root@ai-rd cby]# docker images | grep cudanvidia/cuda                          11.0-base   2ec708416bb8   15 months ago   122MB[root@ai-rd cby]#

装置降级内核

[root@ai-rd cby]# yum install kernel-headers[root@ai-rd cby]# yum install kernel-devel[root@ai-rd cby]# yum update kernel*

禁用模块,并降级boot

[root@ai-rd cby]# vim /etc/modprobe.d/blacklist-nouveau.conf[root@ai-rd cby]# cat /etc/modprobe.d/blacklist-nouveau.confblacklist nouveauoptions nouveau modeset=0[root@ai-rd cby]#[root@ai-rd cby]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak[root@ai-rd cby]# sudo dracut -v /boot/initramfs-$(uname -r).img $(uname -r)

下载驱动并装置

[root@localhost ~]# wget https://cn.download.nvidia.cn/tesla/450.156.00/NVIDIA-Linux-x86_64-450.156.00.run[root@ai-rd cby]# chmod +x NVIDIA-Linux-x86_64-450.156.00.run[root@ai-rd cby]# ./NVIDIA-Linux-x86_64-450.156.00.run

配置docker

[root@ai-rd ~]# vim /etc/docker/daemon.json[root@ai-rd ~]# cat /etc/docker/daemon.json{    "runtimes": {        "nvidia": {            "path": "nvidia-container-runtime",            "runtimeArgs": []        }    }}[root@ai-rd ~]#[root@ai-rd ~]# systemctl daemon-reload[root@ai-rd ~]#[root@ai-rd ~]#[root@ai-rd ~]#[root@ai-rd ~]# systemctl  restart docker[root@ai-rd ~]#

测试docker中的调用状况

[root@ai-rd ~]#[root@ai-rd ~]# sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smiTue Nov 23 06:03:04 2021      +-----------------------------------------------------------------------------+| NVIDIA-SMI 450.156.00   Driver Version: 450.156.00   CUDA Version: 11.0     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||                               |                      |               MIG M. ||===============================+======================+======================||   0  Tesla T4            Off  | 00000000:86:00.0 Off |                    0 || N/A   90C    P0    34W /  70W |      0MiB / 15109MiB |      6%      Default ||                               |                      |                  N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                                  ||  GPU   GI   CI        PID   Type   Process name                  GPU Memory ||        ID   ID                                                   Usage      ||=============================================================================||  No running processes found                                                 |+-----------------------------------------------------------------------------+[root@ai-rd ~]#

https://blog.csdn.net/qq_3392...

https://my.oschina.net/u/3981543

https://www.zhihu.com/people/...

https://segmentfault.com/u/hp...

https://juejin.cn/user/331578...

https://space.bilibili.com/35...

https://cloud.tencent.com/dev...

知乎、CSDN、开源中国、思否、掘金、哔哩哔哩、腾讯云

本文应用 文章同步助手 同步