操作系统: Ubuntu 16.04/18.04

装置 Nvidia Driver

举荐应用 graphics drivers PPA 装置 Nvidia 驱动。

sudo add-apt-repository ppa:graphics-drivers/ppasudo apt update

检测举荐的 Nvidia 显卡驱动:

ubuntu-drivers devices

装置 Nvidia 驱动(以下是 RTX2060 上的状况):

# Ubuntu 16.04 only search 430 for CUDA < 10.2apt-cache search nvidiasudo apt install nvidia-430# Ubuntu 18.04 could search 440 for CUDA <= 10.2apt-cache search nvidia | grep ^nvidia-driversudo apt install nvidia-driver-440
驱动对应的 CUDA 版本,请见 CUDA Compatibility 。

最初, sudo reboot 重启。之后,运行 nvidia-smi 输入 Nvidia 驱动信息:

$ nvidia-smiFri Apr 17 07:31:55 2020+-----------------------------------------------------------------------------+| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A || N/A   40C    P8     5W /  N/A |    263MiB /  5934MiB |      3%      Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID   Type   Process name                             Usage      ||=============================================================================||    0      1560      G   /usr/lib/xorg/Xorg                           144MiB ||    0      1726      G   /usr/bin/gnome-shell                          76MiB ||    0      2063      G   ...uest-channel-token=10544833948196615517    39MiB |+-----------------------------------------------------------------------------+
如果装置 CUDA Toolkit ,请先理解 CUDA Compatibility 。装置 CUDA Toolkit 时,留神其携带的驱动版本,最好将其与驱动别离进行装置。而驱动从官网上间接找适合的版本。

装置 Docker

# update the apt package indexsudo apt-get update# install packages to allow apt to use a repository over HTTPSsudo apt-get install apt-transport-https ca-certificates curl gnupg2 software-properties-common# add Docker’s official GPG keycurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -# set up the stable repositorysudo add-apt-repository \  "deb [arch=amd64] https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu \  $(lsb_release -cs) \  stable"# update the apt package indexsudo apt-get update# install the latest version of Docker Engine and containerdsudo apt-get install docker-ce docker-ce-cli containerd.io

之后,将 Docker 设为 non-root 用户可用:

sudo groupadd dockersudo usermod -aG docker $USER

参考

  • Install Docker Engine on Ubuntu
  • Docker CE 清华源
  • Post-installation steps for Linux

装置 Nvidia Docker

# add the package repositoriesdistribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get updatesudo apt-get install -y nvidia-container-toolkitsudo systemctl restart docker

应用

#### Test nvidia-smi with the latest official CUDA imagedocker run --gpus all nvidia/cuda:10.0-base nvidia-smi# Start a GPU enabled container on two GPUsdocker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi# Starting a GPU enabled container on specific GPUsdocker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smidocker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi# Specifying a capability (graphics, compute, ...) for my container# Note this is rarely if ever used this waydocker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi
$ docker run --gpus all nvidia/cuda:10.2-base-ubuntu16.04 nvidia-smiUnable to find image 'nvidia/cuda:10.2-base-ubuntu16.04' locally10.2-base-ubuntu16.04: Pulling from nvidia/cuda976a760c94fc: Pull completec58992f3c37b: Pull complete0ca0e5e7f12e: Pull completef2a274cc00ca: Pull complete708a53113e13: Pull complete7dde2dc03189: Pull complete2d21d4aba891: Pull completeDigest: sha256:1423b386bb4f950d12b3b0f3ad51eba42d754ee73f8fc4a60657a1904993b68cStatus: Downloaded newer image for nvidia/cuda:10.2-base-ubuntu16.04Fri Apr 24 08:17:26 2020+-----------------------------------------------------------------------------+| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     ||-------------------------------+----------------------+----------------------+| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC || Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. ||===============================+======================+======================||   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A || N/A   38C    P8    10W /  N/A |    523MiB /  5934MiB |     21%      Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:                                                       GPU Memory ||  GPU       PID   Type   Process name                             Usage      ||=============================================================================|+-----------------------------------------------------------------------------+

参考

  • nvidia-docker
  • nvidia/cuda

结语

Go coding!


分享 Coding 中实用的小技巧、小常识!欢送关注,独特成长!