Proxmox에서 vGPU를 설정하는 방법을 정리합니다.
Proxmox 8.2
# echo "deb <http://download.proxmox.com/debian/pve> bookworm pve-no-subscription" >> /etc/apt/sources.list
# apt update
# apt -y install proxmox-kernel-6.5 proxmox-headers-6.5 build-essential dkms mdevctl zip
커널 목록 확인 후 특정 버전 고정:
# proxmox-boot-tool kernel list
# proxmox-boot-tool kernel pin 6.5.13-6-pve
# nano /etc/default/grub
다음 내용을 추가 및 수정합니다:
+ GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
변경 사항 적용:
# update-grub
# reboot
# echo -e "vfio\\nvfio_iommu_type1\\nvfio_pci\\nvfio_virqfd" >> /etc/modules
# echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
초기화 후 시스템 재부팅:
# update-initramfs -u -k all
# reboot
# dmesg | grep -e DMAR -e IOMMU
출력 예시:
[ 1.714214] DMAR: IOMMU enabled
NVIDIA GRID vGPU 드라이버를 다운로드 후 압축 해제:
# cd ~/NVIDIA
# unzip NVIDIA-GRID-Linux-KVM-550.54.16-550.54.15-551.78.zip
호스트 드라이버 설치:
# cd Host_Drivers
# sh NVIDIA-Linux-x86_64-550.54.16-vgpu-kvm.run
설치 확인:
# nvidia-smi
출력 예시:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.16 Driver Version: 550.54.16 CUDA Version: N/A |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S Off | 00000000:0D:00.0 Off | 0 |
| N/A 36C P0 122W / 350W | 0MiB / 46068MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S Off | 00000000:B5:00.0 Off | 0 |
| N/A 37C P0 124W / 350W | 0MiB / 46068MiB | 3% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
# nvidia-smi vgpu --supported
출력 예시:
GPU 00000000:0D:00.0
NVIDIA L40S-1B
NVIDIA L40S-2B
NVIDIA L40S-1Q
NVIDIA L40S-2Q
NVIDIA L40S-3Q
NVIDIA L40S-4Q
NVIDIA L40S-6Q
NVIDIA L40S-8Q
NVIDIA L40S-12Q
NVIDIA L40S-16Q
NVIDIA L40S-24Q
NVIDIA L40S-48Q
NVIDIA L40S-1A
NVIDIA L40S-2A
NVIDIA L40S-3A
NVIDIA L40S-4A
NVIDIA L40S-6A
NVIDIA L40S-8A
NVIDIA L40S-12A
NVIDIA L40S-16A
NVIDIA L40S-24A
NVIDIA L40S-48A
# /usr/lib/nvidia/sriov-manage -e b5:00.0
설정 적용 후 다시 GPU 목록 확인:
# lspci | grep NVIDIA
출력 예시:
0d:00.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:00.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:00.4 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:00.5 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:00.6 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:00.7 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.1 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.2 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.3 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.4 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.5 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.6 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:01.7 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.1 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.2 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.3 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.4 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.5 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.6 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:02.7 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.1 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.2 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.3 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.4 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.5 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.6 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:03.7 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:04.0 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:04.1 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:04.2 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
b5:04.3 3D controller: NVIDIA Corporation AD102GL [L40S] (rev a1)
# cat << EOF > /etc/systemd/system/nvidia-sriov.service
[Unit]
Description=Enable NVIDIA SR-IOV
After=network.target nvidia-vgpud.service nvidia-vgpu-mgr.service
Before=pve-guests.service
[Service]
Type=oneshot
ExecStart=/usr/lib/nvidia/sriov-manage -e ALL
ExecStartPre=/bin/sleep 5
[Install]
WantedBy=multi-user.target
EOF
서비스 적용 및 활성화:
# systemctl daemon-reload
# systemctl enable --now nvidia-sriov.service
설정 적용 후 재부팅:
# reboot
1. 개요 Rocky Linux는 엔터프라이즈 환경에서 사용되는 RHEL(Red Hat Enterprise Linux)과 완전히 호환되는 오픈소스 Linux…
https://youtu.be/XwG4jBWakzQ 1. 개요 Supermicro IPMIView는 Supermicro에서 제공하는 IPMI (Intelligent Platform Management Interface) 기반의 통합 관리…
1. 개요 이 문서는 두 개의 NIC (enp5s0f0, enp5s0f1)를 bonding(active-backup) 방식으로 구성하고, 해당 bond 장치를 브리지(br0) 와 연결하여 KVM 가상머신에서…
1. 개요 KVM에서 NVIDIA GPU를 Passthrough 설정하여 VM에 할당할 때 RmInitAdapter failed 오류를 자주 접하게…
1. 개요 Proxmox에서 pGPU(Physical GPU)와 vGPU(Virtual GPU)를 동일한 서버에서 동시에 사용하는 방법을 정리합니다. 2. 버전…
1. 개요 Proxmox 환경에서 pGPU(Partitioned GPU)를 설정하는 방법을 정리한 가이드입니다. 2. 버전 Proxmox 8.2 3.…