- LINUX

[Rocky] NVIDIA MIG(Multi-Instance_GPU) 설정 및 생성, 삭제 (2)






1. 개요

NVIDIA MIG 설정하고 생성, 삭제할 수 있다.







2. 버전 및 사양

Rocky-9.2
NVIDIA A100 80GB PCIe







3. 참고 링크





3-1. [Rocky] NVIDIA_MIG(Multi-Instance_GPU)란? (1)

BLOG
YouTube




3-2. [Rocky] NVIDA 그래픽 드라이버 설치

BLOG
YouTube







4. MIG





4-1. MIG 활성화

# nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:03:00.0 Off |                    0 |
| N/A   43C    P0              68W / 300W |      4MiB / 81920MiB |     24%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+


# nvidia-smi -i 0 -mig 1
# nvidia-smi –gpu-reset

GPU 00000000:03:00.0 was successfully reset.
All done.




4-2. MIG 설정 확인

# nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:03:00.0 Off |                   On |
| N/A   44C    P0              73W / 300W |      0MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  No MIG devices found                                                                 |
+---------------------------------------------------------------------------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+




4-3. MIG 프로필 확인

# nvidia-smi mig -lgip

+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     7/7        9.50       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.50       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.20gb       15     4/4        19.50      No     14     1     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.50      No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     2/2        39.25      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.25      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     1/1        78.75      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+


# nvidia-smi mig -lgipp

GPU  0 Profile ID 19 Placements: {0,1,2,3,4,5,6}:1
GPU  0 Profile ID 20 Placements: {0,1,2,3,4,5,6}:1
GPU  0 Profile ID 15 Placements: {0,2,4,6}:2
GPU  0 Profile ID 14 Placements: {0,2,4}:2
GPU  0 Profile ID  9 Placements: {0,4}:4
GPU  0 Profile ID  5 Placement : {0}:4
GPU  0 Profile ID  0 Placement : {0}:8







5. MIG GI(GPU Instance) 생성





5-1. GI(GPU Instance) 생성 방법 1

MIG 프로필 ID로 생성


# nvidia-smi mig -cgi 15




5-2. GI(GPU Instance) 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




5-3. GI 생성 가능 개수 확인

# nvidia-smi mig -lgip

+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     6/7        9.50       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.50       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.20gb       15     3/4        19.50      No     14     1     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.50      No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     1/2        39.25      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.25      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     0/1        78.75      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+




5-4. GI(GPU Instance) 생성 방법 2,3

MIG 이름으로 생성


# nvidia-smi mig -cgi 1g.10gb,”MIG 1g.10gb”




5-5. GI(GPU Instance) 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+







6. MIG CI(Compute Instance) 생성





6-1. GI(GPU Instance) 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




6-2. CI(Compute Instance) 단일 생성

# nvidia-smi mig -cci -gi 11




6-3. CI(Compute Instance) 생성 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+




6-4. CI(Compute Instance) 동시 생성

# nvidia-smi mig -cci -gi 12,6




6-5. CI(Compute Instance) 생성 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+







7. MIG GI(GPU Instance), CI(Compute Instance) 생성




7-1. GI(GPU Instance), CI(Compute Instance) 동시 생성

# nvidia-smi mig -cgi 19 -C




7-2. GI(GPU Instance) 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




7-3. CI(Compute Instance) 생성 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+







8. MIG CI(Compute Instance) 삭제





8-1. CI(Compute Instance) 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+




8-2. CI(Compute Instance) 삭제

# nvidia-smi mig -dci -ci 8 -gi 11

Successfully destroyed compute instance ID 0 from GPU 0 instance ID 11




8-3. CI(Compute Instance) 삭제 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0      6       MIG 1g.20gb          0         0          0:1     |
+--------------------------------------------------------------------+







9. MIG GI(GPU Instance) 삭제





9-1. GI(GPU Instance) 확인

# nvidia-smi mig -igi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




9-2. GI(GPU Instance) 삭제

# nvidia-smi mig -dgi -gi 11

Successfully destroyed GPU instance ID 11 from GPU 0




9-3. GI(GPU Instance) 삭제 확인

# nvidia-smi mig -igi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   0  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+







10. GI(GPU Instance), CI(Compute Instance) 삭제





10-1. CI(Compute Instance) 전체 삭제

# nvidia-smi mig -dcl

Successfully destroyed compute instance ID 0 from GPU 0 instance ID 7
Successfully destroyed compute instance ID 0 from GPU 0 instance ID 12
Successfully destroyed compute instance ID 0 from GPU 0 instance ID 6




10-2. GI(GPU Instance) 전체 삭제

# nvidia-smi mig -dgi

Successfully destroyed GPU instance ID 7 from GPU 0
Successfully destroyed GPU instance ID 12 from GPU 0
Successfully destroyed GPU instance ID 6 from GPU 0




10-3. CI(Compute Instance) 확인

# nvidia-smi mig -lci

No GPU Instances found: Not Found




10-4. GI(GPU Instance) 확인

# nvidia-smi mig -lgi

No GPU Instances found: Not Found







11. MIG





11-1. MIG 비활성화

# nvidia-smi -i 0 -mig 0





11-2. MIG 비활성화 확인

# nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          Off | 00000000:03:00.0 Off |                    0 |
| N/A   43C    P0              68W / 300W |      4MiB / 81920MiB |     24%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+



seuheu

최근 게시물

[Linux] Rocky Linux 9.5 NFS 구성 및 테스트 가이드 (rw/ro + root_squash 비교)

https://youtu.be/n8-wlkZiqio 1. 개요 NFS(Network File System)를 설치하고, rw/ro 및 root_squash 옵션에 따른 접근 제어와 성능을 테스트하는 방법을 정리한 가이드입니다.…

%일 전

[Linux] Rocky Linux 9.5 Cron 설정 및 테스트 방법

https://youtu.be/4MVxzmepY3s 1. 개요 리눅스에서 정기적으로 실행되는 작업(백업, 로그 정리, 모니터링 등)은 cron 서비스를 통해 자동화할 수 있습니다.…

%일 전

[Linux] Rocky Linux 9.5 계정 관리 (생성 · 권한 · 잠금 · 삭제) 정리

https://youtu.be/vPfxWFBE1yc 1. 개요 리눅스 서버를 운영할 때 사용자 계정 생성, 비밀번호 설정, 권한 부여, 계정…

%일 전

[Linux] Rocky Linux 9.5 Chrony로 시간 동기화 설정하기

https://youtu.be/Gvp2XwBfoKw 1. 개요 리눅스 서버에서는 시스템 시간(OS 시간) 과 하드웨어 시간(RTC, Real-Time Clock) 을 동기화하는 것이 매우 중요합니다. 클러스터…

%일 전

[Linux] Rocky Linux 9.5 리눅스 시간 관리 입문: 하드웨어(RTC)와 시스템(OS) 시간 개념부터 동기화까지

https://youtu.be/pt9qhawl8LY 1. 개요 리눅스 서버에서는 시스템 시간(OS 시간) 과 하드웨어 시간(RTC, Real-Time Clock) 을 모두 관리할 수 있습니다. 운영체제의…

%일 전

[Linux] Rocky Linux 9.5 타임존(Timezone)이 뭐예요? 리눅스 시간 확인과 변경 방법

https://youtu.be/iPdHGXh7DUg 1. 개요 서버 운영 시 시스템 시간이 올바르게 설정되어 있지 않으면 로그 분석, 모니터링,…

%일 전