nvidai-smi를 실행할 때 몇 초동안 응답이 지연되는 경우가 발생합니다.nvidia-smi 실행 시 GPU 상태를 확인하는 과정에서 장치 초기화로 인해 지연이 발생할 수 있습니다.nvidia-persistenced 서비스를 활용하면 GPU 상태 유지를 통해 초기화 지연을 장지할 수 있습니다.nvidia-smi -pm 1과 비슷한 역할을 하지만, 데몬이 실행된 동안 GPU 상태를 지속적으로 유지합니다.nvidia-smi -pm 1보다 더 강력한 GPU 상태 유지 기능을 제공할 수 있습니다.| 명령어 | 기능 | 설명 | 재부팅 후 유지 여부 |
|---|---|---|---|
nvidia-smi -pm 1 | 영구 모드(Persistence Mode) 설정 | GPU가 사용되지 않아도 전원을 유지하여 nvidia-smi실행 속도를 빠르게 함 | ❌ (재부팅하면 해제됨) |
nvidia-persistenced | GPU 상태 유지 서비스 | nvidia-smi -pm 1과 유사하지만, 데몬 프로세스로 GPU 상태를 유지 | ❌ (재부팅하면 해제됨) |
둘 다 설정 (systemd 사용) | 영구 모드 + GPU 상태 유지 | nvidia-persistenced를 실행하고 nvidia-smi -pm 1도 활성화하면 GPU 초기화 지연 최소화 | ✅ (자동 실행 설정하면 유지됨) |
# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe Off | 00000000:1B:00.0 Off | 0 |
| N/A 58C P0 95W / 300W | 1MiB / 81920MiB | 3% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================# nvidia-smi -pm 1
==================================================
Enabled persistence mode for GPU 00000000:1B:00.0.
All done.
==================================================# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:1B:00.0 Off | 0 |
| N/A 58C P0 86W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================# nvidia-persistenced# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:1B:00.0 Off | 0 |
| N/A 58C P0 86W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================# ps aux | grep nvidia-persistenced
=======================================================================================
root 10103 0.0 0.0 18536 1912 ? Ss 20:57 0:00 nvidia-persistenced
root 10231 0.0 0.0 222016 1092 pts/1 S+ 20:57 0:00 grep --color=auto nvidia-persistenced
=======================================================================================# vim /etc/systemd/system/nvidia-persistenced.service
=====================================================
[Unit]
Description=NVIDIA Persistence Daemon
After=multi-user.target
[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced
ExecStop=/usr/bin/nvidia-persistenced --stop
Restart=always
[Install]
WantedBy=multi-user.target
=====================================================# systemctl daemon-reload
# systemctl enable nvidia-persistenced
# systemctl start nvidia-persistenced# reboot# systemctl status nvidia-persistenced
===============================================================================================
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/etc/systemd/system/nvidia-persistenced.service; enabled; vendor preset: di>
Active: active (running) since Fri 2024-12-27 21:02:06 KST; 14s ago
Process: 6671 ExecStart=/usr/bin/nvidia-persistenced (code=exited, status=0/SUCCESS)
Main PID: 6672 (nvidia-persiste)
Tasks: 1 (limit: 3355442)
Memory: 952.0K
CGroup: /system.slice/nvidia-persistenced.service
└─6672 /usr/bin/nvidia-persistenced
Dec 27 21:02:05 node01 systemd[1]: Starting NVIDIA Persistence Daemon...
Dec 27 21:02:05 node01 nvidia-persistenced[6672]: Started (6672)
Dec 27 21:02:06 node01 systemd[1]: Started NVIDIA Persistence Daemon.
===============================================================================================# nvidia-smi
===========================================================================================
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:1B:00.0 Off | 0 |
| N/A 50C P0 53W / 300W | 1MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
===========================================================================================https://youtu.be/n8-wlkZiqio 1. 개요 NFS(Network File System)를 설치하고, rw/ro 및 root_squash 옵션에 따른 접근 제어와 성능을 테스트하는 방법을 정리한 가이드입니다.…
https://youtu.be/4MVxzmepY3s 1. 개요 리눅스에서 정기적으로 실행되는 작업(백업, 로그 정리, 모니터링 등)은 cron 서비스를 통해 자동화할 수 있습니다.…
https://youtu.be/vPfxWFBE1yc 1. 개요 리눅스 서버를 운영할 때 사용자 계정 생성, 비밀번호 설정, 권한 부여, 계정…
https://youtu.be/Gvp2XwBfoKw 1. 개요 리눅스 서버에서는 시스템 시간(OS 시간) 과 하드웨어 시간(RTC, Real-Time Clock) 을 동기화하는 것이 매우 중요합니다. 클러스터…
https://youtu.be/pt9qhawl8LY 1. 개요 리눅스 서버에서는 시스템 시간(OS 시간) 과 하드웨어 시간(RTC, Real-Time Clock) 을 모두 관리할 수 있습니다. 운영체제의…
https://youtu.be/iPdHGXh7DUg 1. 개요 서버 운영 시 시스템 시간이 올바르게 설정되어 있지 않으면 로그 분석, 모니터링,…