- LINUX

[Rocky] NVIDIA MIG(Multi-Instance_GPU) 설정 및 생성, 삭제 (2)






1. 개요

NVIDIA MIG 설정하고 생성, 삭제할 수 있다.







2. 버전 및 사양

Rocky-9.2
NVIDIA A100 80GB PCIe







3. 참고 링크





3-1. [Rocky] NVIDIA_MIG(Multi-Instance_GPU)란? (1)

BLOG
YouTube




3-2. [Rocky] NVIDA 그래픽 드라이버 설치

BLOG
YouTube







4. MIG





4-1. MIG 활성화

# nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:03:00.0 Off |                    0 |
| N/A   43C    P0              68W / 300W |      4MiB / 81920MiB |     24%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+


# nvidia-smi -i 0 -mig 1
# nvidia-smi –gpu-reset

GPU 00000000:03:00.0 was successfully reset.
All done.




4-2. MIG 설정 확인

# nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:03:00.0 Off |                   On |
| N/A   44C    P0              73W / 300W |      0MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  No MIG devices found                                                                 |
+---------------------------------------------------------------------------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+




4-3. MIG 프로필 확인

# nvidia-smi mig -lgip

+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     7/7        9.50       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.50       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.20gb       15     4/4        19.50      No     14     1     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.50      No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     2/2        39.25      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.25      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     1/1        78.75      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+


# nvidia-smi mig -lgipp

GPU  0 Profile ID 19 Placements: {0,1,2,3,4,5,6}:1
GPU  0 Profile ID 20 Placements: {0,1,2,3,4,5,6}:1
GPU  0 Profile ID 15 Placements: {0,2,4,6}:2
GPU  0 Profile ID 14 Placements: {0,2,4}:2
GPU  0 Profile ID  9 Placements: {0,4}:4
GPU  0 Profile ID  5 Placement : {0}:4
GPU  0 Profile ID  0 Placement : {0}:8







5. MIG GI(GPU Instance) 생성





5-1. GI(GPU Instance) 생성 방법 1

MIG 프로필 ID로 생성


# nvidia-smi mig -cgi 15




5-2. GI(GPU Instance) 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




5-3. GI 생성 가능 개수 확인

# nvidia-smi mig -lgip

+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     6/7        9.50       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.50       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.20gb       15     3/4        19.50      No     14     1     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.50      No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     1/2        39.25      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.25      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     0/1        78.75      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+




5-4. GI(GPU Instance) 생성 방법 2,3

MIG 이름으로 생성


# nvidia-smi mig -cgi 1g.10gb,”MIG 1g.10gb”




5-5. GI(GPU Instance) 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+







6. MIG CI(Compute Instance) 생성





6-1. GI(GPU Instance) 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




6-2. CI(Compute Instance) 단일 생성

# nvidia-smi mig -cci -gi 11




6-3. CI(Compute Instance) 생성 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+




6-4. CI(Compute Instance) 동시 생성

# nvidia-smi mig -cci -gi 12,6




6-5. CI(Compute Instance) 생성 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+







7. MIG GI(GPU Instance), CI(Compute Instance) 생성




7-1. GI(GPU Instance), CI(Compute Instance) 동시 생성

# nvidia-smi mig -cgi 19 -C




7-2. GI(GPU Instance) 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




7-3. CI(Compute Instance) 생성 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+







8. MIG CI(Compute Instance) 삭제





8-1. CI(Compute Instance) 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+




8-2. CI(Compute Instance) 삭제

# nvidia-smi mig -dci -ci 8 -gi 11

Successfully destroyed compute instance ID 0 from GPU 0 instance ID 11




8-3. CI(Compute Instance) 삭제 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+







9. MIG GI(GPU Instance) 삭제





9-1. GI(GPU Instance) 확인

# nvidia-smi mig -igi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




9-2. GI(GPU Instance) 삭제

# nvidia-smi mig -dgi -gi 11

Successfully destroyed GPU instance ID 11 from GPU 0




9-3. GI(GPU Instance) 삭제 확인

# nvidia-smi mig -igi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+







10. GI(GPU Instance), CI(Compute Instance) 삭제





10-1. CI(Compute Instance) 전체 삭제

# nvidia-smi mig -dcl

Successfully destroyed compute instance ID 0 from GPU 0 instance ID 7
Successfully destroyed compute instance ID 0 from GPU 0 instance ID 12
Successfully destroyed compute instance ID 0 from GPU 0 instance ID 6




10-2. GI(GPU Instance) 전체 삭제

# nvidia-smi mig -dgi

Successfully destroyed GPU instance ID 7 from GPU 0
Successfully destroyed GPU instance ID 12 from GPU 0
Successfully destroyed GPU instance ID 6 from GPU 0




10-3. CI(Compute Instance) 확인

# nvidia-smi mig -lci

No GPU Instances found: Not Found




10-4. GI(GPU Instance) 확인

# nvidia-smi mig -lgi

No GPU Instances found: Not Found







11. MIG





11-1. MIG 비활성화

# nvidia-smi -i 0 -mig 0





11-2. MIG 비활성화 확인

# nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:03:00.0 Off |                    0 |
| N/A   43C    P0              68W / 300W |      4MiB / 81920MiB |     24%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+



seuheu

최근 게시물

[Linux] Rocky Linux란 무엇인가?

1. 개요 Rocky Linux는 엔터프라이즈 환경에서 사용되는 RHEL(Red Hat Enterprise Linux)과 완전히 호환되는 오픈소스 Linux…

%일 전

[Hardware] Supermicro IPMIView 설치 및 사용법

https://youtu.be/XwG4jBWakzQ 1. 개요 Supermicro IPMIView는 Supermicro에서 제공하는 IPMI (Intelligent Platform Management Interface) 기반의 통합 관리…

%일 전

[Rocky 8.10] KVM NIC Bonding + Bridge 구성하기

1. 개요 이 문서는 두 개의 NIC (enp5s0f0, enp5s0f1)를 bonding(active-backup) 방식으로 구성하고, 해당 bond 장치를 브리지(br0) 와 연결하여 KVM 가상머신에서…

%일 전

[Rocky] KVM에서 NVIDIA GPU Passthrough 시 RmInitAdapter failed 오류 해결하기

1. 개요 KVM에서 NVIDIA GPU를 Passthrough 설정하여 VM에 할당할 때 RmInitAdapter failed 오류를 자주 접하게…

%일 전

[Proxmox] pGPU와 vGPU 동시 사용 설정

1. 개요 Proxmox에서 pGPU(Physical GPU)와 vGPU(Virtual GPU)를 동일한 서버에서 동시에 사용하는 방법을 정리합니다. 2. 버전…

%일 전

[Proxmox] vGPU 설정

1. 개요 Proxmox에서 vGPU를 설정하는 방법을 정리합니다. 2. 버전 Proxmox 8.2 3. vGPU란? vGPU(Virtual GPU)는…

%일 전