乐趣区

关于linux:给-linux-的-NVIDIA-GPU-安装-CUDA-Toolkit

拿到电脑后,我先一个 ubuntu-drivers devices 查看可用的驱动版本

╰─➤  ubuntu-drivers devices

ERROR:root:aplay command not found
== /sys/devices/pci0000:ae/0000:ae:00.0/0000:af:00.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor   : NVIDIA Corporation
model    : TU104GL [Tesla T4]
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-525-server - distro non-free
driver   : nvidia-driver-525 - distro non-free
driver   : nvidia-driver-450-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

参考:

  • 应用 ubuntu-drivers 装置 nvida 显卡驱动的一些疑难?
  • 对于『应用 ubuntu-drivers 装置 nvida 显卡驱动的一些疑难』解答

因为我是 ubuntu server 版本,而不是 Desktop 版本,所以我要装置带 -server 后缀的

而后我想装置最新的,最新的是 535,那就选他了

sudo apt install -y nvidia-driver-535-server

装置好了之后,重启电脑

sudo reboot

等电脑重启好了,输出 nvidia-smi 查看显卡信息

╰─➤  nvidia-smi                                                                                                                                                                                                                                                                                                                                            130 ↵
Mon Sep 18 14:30:16 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:AF:00.0 Off |                    0 |
| N/A   47C    P0              27W /  70W |      2MiB / 15360MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

能够看到,此时曾经能够读取到显卡了

阐明这个显卡曾经能够应用了


那么此时咱们只装置了 driver,然而没有显性的装置 CUDA Toolkit,那么此时,咱们有 CUDA Toolkit 吗?比方可能装置 driver 的时候,主动装置了 CUDA Toolkit?

答案是不会

此时你的终端输出 nvcc --version

╰─➤  nvcc --version

zsh: command not found: nvcc

放回是 not found

所以,怎么装置 CUDA Toolkit?

nv 显卡装置驱动以及周边日志

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.2/local_installers/cuda-repo-ubuntu2204-12-2-local_12.2.2-535.104.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-2-local_12.2.2-535.104.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
退出移动版