nvidai-smi
를 실행할 때 몇 초동안 응답이 지연되는 경우가 발생합니다.nvidia-smi
실행 시 GPU 상태를 확인하는 과정에서 장치 초기화로 인해 지연이 발생할 수 있습니다.nvidia-persistenced
서비스를 활용하면 GPU 상태 유지를 통해 초기화 지연을 장지할 수 있습니다.nvidia-smi -pm 1
과 비슷한 역할을 하지만, 데몬이 실행된 동안 GPU 상태를 지속적으로 유지합니다.nvidia-smi -pm 1
보다 더 강력한 GPU 상태 유지 기능을 제공할 수 있습니다.명령어 | 기능 | 설명 | 재부팅 후 유지 여부 |
---|---|---|---|
nvidia-smi -pm 1 | 영구 모드(Persistence Mode) 설정 | GPU가 사용되지 않아도 전원을 유지하여 nvidia-smi 실행 속도를 빠르게 함 | ❌ (재부팅하면 해제됨) |
nvidia-persistenced | GPU 상태 유지 서비스 | nvidia-smi -pm 1 과 유사하지만, 데몬 프로세스로 GPU 상태를 유지 | ❌ (재부팅하면 해제됨) |
둘 다 설정 (systemd 사용) | 영구 모드 + GPU 상태 유지 | nvidia-persistenced 를 실행하고 nvidia-smi -pm 1 도 활성화하면 GPU 초기화 지연 최소화 | ✅ (자동 실행 설정하면 유지됨) |
# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:1B:00.0 Off | 0 |
| N/A 58C P0 95W / 300W | 1MiB / 81920MiB | 3% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================
# nvidia-smi -pm 1
==================================================
Enabled persistence mode for GPU 00000000:1B:00.0.
All done.
==================================================
# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:1B:00.0 Off | 0 |
| N/A 58C P0 86W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================
# nvidia-persistenced
# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:1B:00.0 Off | 0 |
| N/A 58C P0 86W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================
# ps aux | grep nvidia-persistenced
=======================================================================================
root 10103 0.0 0.0 18536 1912 ? Ss 20:57 0:00 nvidia-persistenced
root 10231 0.0 0.0 222016 1092 pts/1 S+ 20:57 0:00 grep --color=auto nvidia-persistenced
=======================================================================================
# vim /etc/systemd/system/nvidia-persistenced.service
=====================================================
[Unit]
Description=NVIDIA Persistence Daemon
After=multi-user.target
[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced
ExecStop=/usr/bin/nvidia-persistenced --stop
Restart=always
[Install]
WantedBy=multi-user.target
=====================================================
# systemctl daemon-reload
# systemctl enable nvidia-persistenced
# systemctl start nvidia-persistenced
# reboot
# systemctl status nvidia-persistenced
===============================================================================================
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/etc/systemd/system/nvidia-persistenced.service; enabled; vendor preset: di>
Active: active (running) since Fri 2024-12-27 21:02:06 KST; 14s ago
Process: 6671 ExecStart=/usr/bin/nvidia-persistenced (code=exited, status=0/SUCCESS)
Main PID: 6672 (nvidia-persiste)
Tasks: 1 (limit: 3355442)
Memory: 952.0K
CGroup: /system.slice/nvidia-persistenced.service
└─6672 /usr/bin/nvidia-persistenced
Dec 27 21:02:05 node01 systemd[1]: Starting NVIDIA Persistence Daemon...
Dec 27 21:02:05 node01 nvidia-persistenced[6672]: Started (6672)
Dec 27 21:02:06 node01 systemd[1]: Started NVIDIA Persistence Daemon.
===============================================================================================
# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:1B:00.0 Off | 0 |
| N/A 50C P0 53W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================
1. 개요 Rocky Linux는 엔터프라이즈 환경에서 사용되는 RHEL(Red Hat Enterprise Linux)과 완전히 호환되는 오픈소스 Linux…
https://youtu.be/XwG4jBWakzQ 1. 개요 Supermicro IPMIView는 Supermicro에서 제공하는 IPMI (Intelligent Platform Management Interface) 기반의 통합 관리…
1. 개요 이 문서는 두 개의 NIC (enp5s0f0, enp5s0f1)를 bonding(active-backup) 방식으로 구성하고, 해당 bond 장치를 브리지(br0) 와 연결하여 KVM 가상머신에서…
1. 개요 KVM에서 NVIDIA GPU를 Passthrough 설정하여 VM에 할당할 때 RmInitAdapter failed 오류를 자주 접하게…
1. 개요 Proxmox에서 pGPU(Physical GPU)와 vGPU(Virtual GPU)를 동일한 서버에서 동시에 사용하는 방법을 정리합니다. 2. 버전…
1. 개요 Proxmox에서 vGPU를 설정하는 방법을 정리합니다. 2. 버전 Proxmox 8.2 3. vGPU란? vGPU(Virtual GPU)는…