关于云计算:使用Linux-vfio将Nvidia-GPU透传给QEMU虚拟机

7次阅读

共计 7844 个字符,预计需要花费 20 分钟才能阅读完成。

Linux 上虚拟机 GPU 透传须要应用 vfio 的形式。次要是因为在 vfio 形式下对虚构设施的权限和 DMA 隔离上做的更好。然而这么做也有个毛病,这个物理设施在主机和其余虚拟机都不能应用了。

qemu 间接应用物理设施自身命令行是很简略的,关键在于当时在主机上对系统、内核和物理设施的一些配置。

单纯从 qemu 的命令行来看,其实和一般虚拟机启动就差了最初那个 -device 的选项。这个选项也比拟容易了解,就是把主机上的设施 0000:00:01.0 传给了虚拟机应用。

$ qemu-system-x86_64 -m 4096 -smp 4 –enable-kvm \
-drive file=~/guest/fedora.img \
-device vfio-pci,host=0000:00:01.0
零碎及硬件筹备
BIOS 中关上 IOMMU
设施直通在 x86 平台上须要关上 iommu 性能。这是 Intel 虚构技术 VT-d(Virtualization Technology for Device IO) 中的一个局部。有时候这部分的性能没有被关上。

关上的形式在 BIOS 设置中 Security->Virtualization->VT-d 这个地位。当然不同的 BIOS 地位可能会略有不同。记得在应用直通设施前要将这个选项关上。

内核配置勾选 IOMMU
INTEL_IOMMU
│ Location: │
│ -> Device Drivers │
│ (2) -> IOMMU Hardware Support (IOMMU_SUPPORT [=y])
内核启动参数 enable IOMMU
BIOS 中关上,内核编译选项勾选还不够。还须要在疏导程序中增加上内核启动参数

对应编辑 /etc/default/grub, 设置 GRUB_CMDLINE_LINUX=

$ cat /etc/default/grub

GRUB_CMDLINE_LINUX=”intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 rdblacklist=nouveau nouveau.modeset=0″

从新生成 grub 疏导配置文件

$ grub2-mkconfig -o /boot/grub2/grub.cfg

将 vfio 相干 module 设置为开机 load

$ cat /etc/modules-load.d/vfio.conf
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Setting up IOMMU Kernel parameters

找到 nvidia GPU BusID
record PCI addresses and hardware IDs of the GPU

$ lspci -k | grep -i nvidia -A 3
41:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)

    Subsystem: Device 1b4c:11bf
    Kernel driver in use: vfio-pci
    Kernel modules: nouveau

41:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

    Subsystem: Device 1b4c:11bf
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel

pci address => 41:00.0,41:00.1

device id => 1b4c:11bf

这里找到了两张 nvidia 卡,它们的 device id 都是 1b4c:11bf, 一张是 Audio device

这样是不能 passthrough 进去的,因为:

vfio-pci use your vendor and device id pair to identify which device they need to bind to at boot,

if you have two GPUs sharing such an ID pair you will not be able to get your passthough driver to bind with just one of them

应用上面的脚本解决这种状况:

$ cat /usr/bin/vfio-pci-override.sh

!/bin/sh

for i in $(find /sys/devices/pci* -name boot_vga); do

if [$(cat "$i") -eq 0 ]; then
    GPU="${i%/boot_vga}"
    AUDIO="$(echo"$GPU"| sed -e"s/0$/1/")"
    echo "vfio-pci" > "$GPU/driver_override"
    if [-d "$AUDIO"]; then
        echo "vfio-pci" > "$AUDIO/driver_override"
    fi
fi

done

modprobe -i vfio-pci

把脚本传入 /etc/modprobe.d/vfio.conf

$ cat /etc/modprobe.d/vfio.conf
install vfio-pci /usr/bin/vfio-pci-override.sh
options vfio-pci ids=10de:1c82 disable_vga=1
应用 vfio 治理 GPU

/etc/modprobe.d/vfio.conf, ids 为 lspci 找到的 hardware id, 多个设施的话用 ’,’ 宰割

$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:134d disable_vga=1

禁用 NVIDIA nouveau 开源驱动, /etc/modprobe.d/blacklist.conf

$ cat /etc/modprobe.d/blacklist.conf
blacklist nouveau

kvm 模块配置, /etc/modprobe.d/kvm.conf

$ cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
重启零碎,启动实现后查看以后的 nvidia GPU 是否被 vfio-pci 模块应用, 确认 IOMMU 性能的确关上。

$ dmesg | grep -e DMAR -e IOMMU | grep enabled

如果能搜寻到

DMAR: IOMMU enabled

示意上述配置胜利。

查看 GPU 是否被 vfio-pci 应用

另外留神查看看看 41:00.1 Audio device 是否也被 vfio-pci 应用

$ lspci -k | grep -i -e nvidia -A 3
41:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)

Subsystem: Device 1b4c:11bf
Kernel driver in use: vfio-pci # GTX 1050 Ti GPU 被 vfio-pci 应用
Kernel modules: nouveau

41:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

Subsystem: Device 1b4c:11bf
Kernel driver in use: vfio-pci # 发现 Audio device 也被 vfio-pci 应用了
Kernel modules: snd_hda_intel

list GPU IOMMU group

$ find /sys/kernel/iommu_groups/ -type l | grep 41:00
/sys/kernel/iommu_groups/27/devices/0000:41:00.0
/sys/kernel/iommu_groups/27/devices/0000:41:00.1

找到 IOMMU Group 治理的 PCI 设施

!/bin/bash

shopt -s nullglob
for d in /sys/kernel/iommu_groups//devices/; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf ‘IOMMU Group %s ‘ “$n”
lspci -nns “${d##*/}”
done
应用 qemu 透传 nvidia GPU
筹备好 centos7 镜像,而后在虚拟机外面装置 nvidia 官网闭源驱动和 cuda SDK

我从服务器上拷贝过去的是 vmdk 的镜像,先把它转换成 qcow2 的格局

$ /usr/local/qemu-2.9.0/bin/qemu-img convert -f vmdk -O qcow2 centos-7.3.1611-20180104.vmdk centos-7.3.1611-20180104.qcow2

应用 qemu 启动,留神 -cpu 须要 kvm=off 参数

kvm=off will hide the kvm hypervisor signature, this is required for NVIDIA cards

since its driver will refuse to work on an hypervisor and result in Code 43 on windows

$ cat startvm.sh

!/bin/sh

/usr/local/qemu-2.9.0/bin/qemu-system-x86_64 -enable-kvm \
-m 4096 -cpu host,kvm=off -smp 4,sockets=1,cores=4,threads=1 \
-drive file=./centos-7.3.1611-20180104.qcow2 \
-device vfio-pci,host=41:00.0,multifunction=on,addr=0x16 \
-device vfio-pci,host=41:00.1 \
-net nic,model=e1000 -net user,hostfwd=tcp::5022-:22 \
-vnc :1

这台虚拟机开了 vnc 和 ssh 端口转发,能够应用 vnc 或者 ssh 拜访

从 host 进入虚拟机

$ ssh 127.0.0.1 -p 5022

查看虚拟机透传进来的显卡

$ lspci -k | grep -i nvidia -A 3
00:04.0 Audio device: NVIDIA Corporation Device 0fb9 (rev a1)

Subsystem: Device 1b4c:11bf
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

00:16.0 VGA compatible controller: NVIDIA Corporation GP107 (rev a1)

Subsystem: Device 1b4c:11bf
Kernel modules: nouveau

装置 nvidia 驱动和 Cuda
nvidia 驱动须要从官网下载,如果先装置 cuda 的话会一起装置 nvidia 驱动。接下来采纳虚拟机先装置驱动再装置 cuda 的步骤。

参考:installing-nvidia-drivers-centos-7 NVIDIA CUDA GETTINGS STARTED GUIDE FOR LINUX

装置 nvidia 驱动
下载地址:http://www.nvidia.com/object/…

update 后如果更新内核,须要重启

$ yum -y update

装置 gcc、make、glibc 等工具和库

$ yum -y groupinstall “Development Tools”
$ yum -y install kernel-devel

Download the latest NVIDIA driver for unix.

$ wget http://us.download.nvidia.com…
$ yum -y install epel-release
$ yum -y install dkms

Edit /etc/default/grub. Append the following to“GRUB_CMDLINE_LINUX”

rd.driver.blacklist=nouveau nouveau.modeset=0

Generate a new grub configuration to include the above changes.

$ grub2-mkconfig -o /boot/grub2/grub.cfg

Edit/create /etc/modprobe.d/blacklist.conf and append:

blacklist nouveau

Backup your old initramfs and then build a new one

$ mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
$ dracut /boot/initramfs-$(uname -r).img $(uname -r)

重启 again

Run the NVIDIA driver installer and enter yes to all options.

$ sh NVIDIA-Linux-x86_64-*.run

装好后再一次重启,lspci -k 看下 gpu 应用的驱动是否是 nvidia

$ lspci -k | grep -i nvidia -A 3
00:04.0 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
00:16.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)

Kernel driver in use: nvidia # 发现曾经应用 nvidia 驱动
Kernel modules: nouveau, nvidia_drm, nvidia

执行 nvidia-smi 看下输入和温度

$ nvidia-smi

Thu Mar 15 01:31:09 2018
NVIDIA-SMI 390.42 Driver Version: 390.42
GPU Name Persistence-M Bus-Id Disp.A Volatile Uncorr. ECC
Fan Temp Perf Pwr:Usage/Cap Memory-Usage GPU-Util Compute M.
===============================+======================+======================
0 GeForce GTX 105… Off 00000000:00:16.0 Off N/A
40% 32C P0 N/A / 100W 0MiB / 4040MiB 0% Default
Processes: GPU Memory
GPU PID Type Process name Usage
=============================================================================
No running processes found

$ nvidia-smi -q -d TEMPERATURE

==============NVSMI LOG==============

Timestamp : Thu Mar 15 01:32:42 2018
Driver Version : 390.42

Attached GPUs : 1
GPU 00000000:00:16.0

Temperature
    GPU Current Temp            : 32 C
    GPU Shutdown Temp           : 102 C
    GPU Slowdown Temp           : 99 C
    GPU Max Operating Temp      : N/A
    Memory Current Temp         : N/A
    Memory Max Operating Temp   : N/A

装置 cuda
下载地址:https://developer.nvidia.com/… 这里抉择 runfile,当前为了不便也能够抉择 rpm(network)的形式,会主动帮咱们装置 nvidia 驱动

$ wget https://developer.nvidia.com/…

Say no to installing the NVIDIA driver.

The standalone driver you already installed is typically newer than what is packaged with CUDA.

Use the default option for all other choices.

$ sh cuda_*.run

增加 CUDA 相干的环境变量

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

make samples

$ cd ~/NVIDIA_CUDA-9.1_Samples; make -j 4
$ cd bin/x86_64/linux/release
$ ./deviceQuery # 查问 gpu 信息
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GTX 1050 Ti”
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4040 MBytes (4236312576 bytes)
(6) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1481 MHz (1.48 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes

$ ./bandwidtTest # 应用 cuda 测试 gpu bandwidth
Running on…

Device 0: GeForce GTX 1050 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 9719.0

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 9215.8

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 95525.1

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when

正文完
 0