[Rocky] 다중 GPU MIG 설정




목차



1. 개요

GPU가 여러개인 경우 단일로 또는 다중으로 MIG를 생성하고 삭제하는 방법입니다.







2. 버전

Rocky 8.7







3. 참고 링크





3-1. [Rocky] NVIDA 그래픽 드라이버 설치

BLOG
YouTube




3-2. [Rocky] NVIDIA MIG(Multi Instance GPU)란? (1)

BLOG
YouTube




3-3. [Rocky] NVIDIA MIG(Multi-Instance_GPU) 설정 및 생성, 삭제 (2)

BLOG
YouTube




3-4. [Rocky] NVIDIA MIG(Multi-Instance_GPU) 테스트 (3)

BLOG
YouTube







4. 모든 GPU 설정





4-1. 모든 GPU에 MIG 기능 활성




4-1-1. 설정 전 확인

# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:19:00.0 Off |                    0 |
| N/A   37C    P0            115W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:2D:00.0 Off |                    0 |
| N/A   38C    P0            115W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                    0 |
| N/A   39C    P0            116W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off |   00000000:66:00.0 Off |                    0 |
| N/A   37C    P0            117W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                    0 |
| N/A   36C    P0            116W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          Off |   00000000:AE:00.0 Off |                    0 |
| N/A   40C    P0            127W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:BF:00.0 Off |                    0 |
| N/A   40C    P0            122W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          Off |   00000000:E4:00.0 Off |                    0 |
| N/A   37C    P0            115W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+



4-1-2. 모든 GPU에 MIG 기능 활성

# nvidia-smi -mig 1

Enabled MIG Mode for GPU 00000000:19:00.0

Warning: persistence mode is disabled on device 00000000:19:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:2D:00.0

Warning: persistence mode is disabled on device 00000000:2D:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:3F:00.0

Warning: persistence mode is disabled on device 00000000:3F:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:66:00.0

Warning: persistence mode is disabled on device 00000000:66:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:9B:00.0

Warning: persistence mode is disabled on device 00000000:9B:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:AE:00.0

Warning: persistence mode is disabled on device 00000000:AE:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:BF:00.0

Warning: persistence mode is disabled on device 00000000:BF:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:E4:00.0

Warning: persistence mode is disabled on device 00000000:E4:00.0. See the Known Issues section of the nvidia-smi(1) man page f
or more information. Run with [--help | -h] switch to get more information on how to enable persistence mode.
All done.



4-1-3. 모든 GPU MIG 기능 활성 확인

# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:19:00.0 Off |                   On |
| N/A   37C    P0            115W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:2D:00.0 Off |                   On |
| N/A   38C    P0            115W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                   On |
| N/A   39C    P0            116W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off |   00000000:66:00.0 Off |                   On |
| N/A   37C    P0            117W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                   On |
| N/A   36C    P0            115W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          Off |   00000000:AE:00.0 Off |                   On |
| N/A   40C    P0            126W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:BF:00.0 Off |                   On |
| N/A   40C    P0            122W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          Off |   00000000:E4:00.0 Off |                   On |
| N/A   37C    P0            115W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  No MIG devices found                                                                   |
+-----------------------------------------------------------------------------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+




4-2. 모든 GPU MIG 생성




4-2-1. 생성 가능한 GI 확인

# nvidia-smi mig -i 0 -lgip

+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    instance   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     7/7        9.75       No     16     1     0   |
|                                                             1     1     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.75       No     16     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.20gb       15     4/4        19.62      No     26     1     0   |
|                                                             1     1     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.62      No     32     2     0   |
|                                                             2     2     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     2/2        39.38      No     60     3     0   |
|                                                             3     3     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.38      No     64     4     0   |
|                                                             4     4     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     1/1        79.12      No     132    7     0   |
|                                                             8     7     1   |
+-----------------------------------------------------------------------------+



4-2-2. 모든 GPU에 GI 생성

nvidia-smi mig -cgi <GPU instance profiles ID>


# nvidia-smi mig -cgi 20

Successfully created GPU instance ID 13 on GPU  0 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID 13 on GPU  1 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID 13 on GPU  2 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID  9 on GPU  3 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID 13 on GPU  4 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID  9 on GPU  5 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID 13 on GPU  6 using profile MIG 1g.10gb+me (ID 20)
Successfully created GPU instance ID 13 on GPU  7 using profile MIG 1g.10gb+me (ID 20)



4-2-3. 생성한 GI 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   1  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   2  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   3  MIG 1g.10gb+me      20        9          6:1     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   5  MIG 1g.10gb+me      20        9          6:1     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   7  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+



4-2-4. 생성한 GI에 모든 CI 생성

# nvidia-smi mig -cci

Successfully created compute instance ID  0 on GPU  0 GPU instance ID 13 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  1 GPU instance ID 13 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  2 GPU instance ID 13 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  3 GPU instance ID  9 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  4 GPU instance ID 13 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  5 GPU instance ID  9 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  6 GPU instance ID 13 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  7 GPU instance ID 13 using profile MIG 1g.
10gb (ID  0)



4-2-4.생성한 CI 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   1     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   3      9       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   5      9       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   7     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+



4-2-5. 모든 GPU에 GI, CI 한번에 생성

nvidia-smi mig -cgi <GPU instance profiles ID> -C


# nvidia-smi mig -cgi 19 -C

Successfully created GPU instance ID 11 on GPU  0 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 11 on GPU  1 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  1 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 11 on GPU  2 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  2 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID  7 on GPU  3 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  3 GPU instance ID  7 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 11 on GPU  4 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  4 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID  7 on GPU  5 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  5 GPU instance ID  7 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 11 on GPU  6 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  6 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 11 on GPU  7 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  7 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)



4-2-6. 생성한 GI 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   0  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   1  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   1  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   2  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   2  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   3  MIG 1g.10gb         19        7          4:1     |
+-------------------------------------------------------+
|   3  MIG 1g.10gb+me      20        9          6:1     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   5  MIG 1g.10gb         19        7          4:1     |
+-------------------------------------------------------+
|   5  MIG 1g.10gb+me      20        9          6:1     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+
|   7  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   7  MIG 1g.10gb+me      20       13          6:1     |
+-------------------------------------------------------+



4-2-7. 생성한 CI 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   0     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   1     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   1     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   3      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   3      9       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   5      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   5      9       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   7     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   7     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+




4-3. 모든 GPU에 MIG 삭제




4-3-1. 모든 GPU에 원하는 CI 삭제

nvidia-smi mig -dci -gi <GPU instance ID>


# nvidia-smi mig -dci -gi 13,9

Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 13
Successfully destroyed compute instance ID  0 from GPU  1 GPU instance ID 13
Successfully destroyed compute instance ID  0 from GPU  2 GPU instance ID 13
Successfully destroyed compute instance ID  0 from GPU  3 GPU instance ID  9
Successfully destroyed compute instance ID  0 from GPU  4 GPU instance ID 13
Successfully destroyed compute instance ID  0 from GPU  5 GPU instance ID  9
Successfully destroyed compute instance ID  0 from GPU  6 GPU instance ID 13
Successfully destroyed compute instance ID  0 from GPU  7 GPU instance ID 13



4-3-2. 모든 GPU에 원하는 CI 삭제 확인

# nvidia-smi mig -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   1     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   3      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   5      7       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   7     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+



4-3-3. 모든 GPU에 원하는 GI 삭제

nvidia-smi mig -dci -gi <GPU instance ID>


# nvidia-smi mig -dgi -gi 13,9

Successfully destroyed GPU instance ID 13 from GPU  0
Successfully destroyed GPU instance ID 13 from GPU  1
Successfully destroyed GPU instance ID 13 from GPU  2
Successfully destroyed GPU instance ID  9 from GPU  3
Successfully destroyed GPU instance ID 13 from GPU  4
Successfully destroyed GPU instance ID  9 from GPU  5
Successfully destroyed GPU instance ID 13 from GPU  6
Successfully destroyed GPU instance ID 13 from GPU  7



4-3-4. 모든 GPU에 원하는 GI 삭제 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   1  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   2  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   3  MIG 1g.10gb         19        7          4:1     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   5  MIG 1g.10gb         19        7          4:1     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   7  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+



4-3-5. 모든 CI 삭제

# nvidia-smi mig -dci

Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  1 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  2 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  3 GPU instance ID  7
Successfully destroyed compute instance ID  0 from GPU  4 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  5 GPU instance ID  7
Successfully destroyed compute instance ID  0 from GPU  6 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  7 GPU instance ID 11



4-3-6. 모든 CI 삭제 확인

# nvidia-smi mig -lci

No compute instance found: Not Found



4-3-7. 모든 GI 삭제

# nvidia-smi mig -dgi

Successfully destroyed GPU instance ID 11 from GPU  0
Successfully destroyed GPU instance ID 11 from GPU  1
Successfully destroyed GPU instance ID 11 from GPU  2
Successfully destroyed GPU instance ID  7 from GPU  3
Successfully destroyed GPU instance ID 11 from GPU  4
Successfully destroyed GPU instance ID  7 from GPU  5
Successfully destroyed GPU instance ID 11 from GPU  6
Successfully destroyed GPU instance ID 11 from GPU  7



4-3-6. 모든 GI 삭제 확인

# nvidia-smi mig -lgi

No GPU instance found: Not Found




4-4. 모든 GPU MIG 기능 비활성화




4-4-1. 모든 GPU MIG 기능 비활성 설정

# nvidia-smi -mig 0

Disabled MIG Mode for GPU 00000000:19:00.0

Warning: persistence mode is disabled on device 00000000:19:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:2D:00.0

Warning: persistence mode is disabled on device 00000000:2D:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:3F:00.0

Warning: persistence mode is disabled on device 00000000:3F:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:66:00.0

Warning: persistence mode is disabled on device 00000000:66:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:9B:00.0

Warning: persistence mode is disabled on device 00000000:9B:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:AE:00.0

Warning: persistence mode is disabled on device 00000000:AE:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:BF:00.0

Warning: persistence mode is disabled on device 00000000:BF:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:E4:00.0

Warning: persistence mode is disabled on device 00000000:E4:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
All done.



4-4-2. 모든 GPU MIG 기능 상태 확인

# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:19:00.0 Off |                    0 |
| N/A   35C    P0             75W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:2D:00.0 Off |                    0 |
| N/A   37C    P0             75W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                    0 |
| N/A   41C    P0            157W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off |   00000000:66:00.0 Off |                    0 |
| N/A   39C    P0            158W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                    0 |
| N/A   38C    P0            156W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          Off |   00000000:AE:00.0 Off |                    0 |
| N/A   42C    P0            170W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:BF:00.0 Off |                    0 |
| N/A   41C    P0            164W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          Off |   00000000:E4:00.0 Off |                    0 |
| N/A   38C    P0            156W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+







5. 단일 GPU 설정





5-1. 단일 GPU MIG 기능 활성화




5-1-1. 단일 GPU MIG 활성 설정

nvidia-smi -i <GPU ID> -mig 1


# nvidia-smi -i 2 -mig 1

Enabled MIG Mode for GPU 00000000:3F:00.0

Warning: persistence mode is disabled on device 00000000:3F:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
All done.



5-1-2. 단일 GPU MIG 활성화 확인

# nvidia-smi

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H100 80GB HBM3          Off |   00000000:19:00.0 Off |                    0 |
| N/A   39C    P0            154W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off |   00000000:2D:00.0 Off |                    0 |
| N/A   40C    P0            155W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                   On |
| N/A   41C    P0            160W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off |   00000000:66:00.0 Off |                    0 |
| N/A   39C    P0            157W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                    0 |
| N/A   38C    P0            155W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          Off |   00000000:AE:00.0 Off |                    0 |
| N/A   42C    P0            172W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:BF:00.0 Off |                    0 |
| N/A   41C    P0            163W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          Off |   00000000:E4:00.0 Off |                    0 |
| N/A   38C    P0            156W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  No MIG devices found                                                                   |
+-----------------------------------------------------------------------------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+




5-2. 단일 GPU MIG GI 생성




5-2-1. 단일 GPU MIG GI 생성

nvidia-smi mig -i <GPU ID> -cgi <GPU instance profiles ID>


# nvidia-smi mig -i 2 -cgi 19

Successfully created GPU instance ID 13 on GPU 2 using profile MIG 1g.10gb (ID 19)



5-2-2. 단일 GPU MIG GI 생성 확인

# nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   2  MIG 1g.10gb         19       13          6:1     |
+-------------------------------------------------------+




5-3. 단일 GPU MIG CI 생성




5-3-1. 단일 GPU MIG CI 생성

nvidia-smi mig -i <GPU ID> -cci -gi <GPU instance ID>


# nvidia-smi mig -i 2 -cci -gi 13

Successfully created compute instance ID  0 on GPU  2 GPU inst
ance ID 13 using profile MIG 1g.10gb (ID  0)



5-3-2. 단일 GPU MIG CI 생성 확인

nvidia-smi mig -i <GPU ID> -lci


# nvidia-smi mig -i 2 -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   2     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+




5-4. 단일 GPU MIG GI, CI 생성




5-4-1. 단일 GPU MIG GI, CI 생성

nvidia-smi mig -i <GPU ID> -cgi <GPU instance profiles ID> -C


# nvidia-smi mig -i 2 -cgi 19 -C

Successfully created GPU instance ID 11 on GPU  2 using profil
e MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  2 GPU inst
ance ID 11 using profile MIG 1g.10gb (ID  0)e MIG 1g.10gb (ID 19)



5-4-2. 단일 GPU MIG GI, CI 생성 확인

nvidia-smi mig -i <GPU ID> -lgi


# nvidia-smi mig -i 2 -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   2  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   2  MIG 1g.10gb         19       13          6:1     |
+-------------------------------------------------------+


nvidia-smi mig -i <GPU ID> -lci


# nvidia-smi mig -i 2 -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   2     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+




5-5. 단일 GPU MIG CI 삭제




5-5-1. 단일 GPU MIG CI 삭제

nvidia-smi mig -i <GPU ID> -dci -gi <GPU Instance ID>


# nvidia-smi mig -i 2 -dci -gi 11

Successfully destroyed compute instance ID  0 from GPU  2 GPU instance ID 11



5-5-2. 단일 GPU MIG CI 삭제 확인

nvidia-smi mig -i <GPU ID> -lci


# nvidia-smi mig -i 2 -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   2     13       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+




5-6. 단일 GPU MIG GI 삭제




5-6-1. 단일 GPU MIG GI 삭제

nvidia-smi mig -i <GPU ID> -dgi -gi <GPU Instance ID>


# nvidia-smi mig -i 2 -dgi -gi 11

Successfully destroyed GPU instance ID 11 from GPU  2



5-6-2. 단일 GPU MIG GI 삭제 확인

nvidia-smi mig -i <GPU ID> -lgi


# nvidia-smi mig -i 2 -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   2  MIG 1g.10gb         19       13          6:1     |
+-------------------------------------------------------+




5-7. 단일 GPU MIG 비활성화




5-7-1. 단일 GPU MIG 비활성화 설정

nvidia-smi -i <GPU ID> -mig 0



# nvidia-smi -i 2 -mig 0

Disabled MIG Mode for GPU 00000000:3F:00.0

Warning: persistence mode is disabled on device 00000000:3F:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
All done.



5-7-2. 단일 GPU MIG 비활성화 설정 확인

nvidia-smi -i <GPU ID>


# nvidia-smi -i 2

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                    0 |
| N/A   41C    P0            159W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+







6. 다중 GPU 설정





6-1. 다중 GPU MIG 활성화




6-1-1. 다중 GPU MIG 활성화 설정

nvidia-smi -i <GPU ID> -mig 1


# nvidia-smi -i 2,4,6 -mig 1

Enabled MIG Mode for GPU 00000000:3F:00.0

Warning: persistence mode is disabled on device 00000000:3F:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:9B:00.0

Warning: persistence mode is disabled on device 00000000:9B:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Enabled MIG Mode for GPU 00000000:BF:00.0

Warning: persistence mode is disabled on device 00000000:BF:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
All done.



6-1-2. 다중 GPU MIG 활성화 확인

nvidia-smi -i <GPU ID>


# nvidia-smi -i 2,4,6

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                   On |
| N/A   41C    P0            160W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                   On |
| N/A   38C    P0            159W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:BF:00.0 Off |                   On |
| N/A   42C    P0            167W /  700W |       1MiB /  81559MiB |     N/A      Default |
|                                         |                        |              Enabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  No MIG devices found                                                                   |
+-----------------------------------------------------------------------------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+




6-2. 다중 GPU MIG GI 생성




6-2-1. 다중 GPU MIG GI 생성

nvidia-smi mig -i <GPU ID> -cgi <GPU instance profiles ID>



# nvidia-smi mig -i 2,4,6 -cgi 15,19

Successfully created GPU instance ID  6 on GPU  2 using profile MIG 1g.20gb (ID 15)
Successfully created GPU instance ID 11 on GPU  2 using profile MIG 1g.10gb (ID 19)
Successfully created GPU instance ID  6 on GPU  4 using profile MIG 1g.20gb (ID 15)
Successfully created GPU instance ID 11 on GPU  4 using profile MIG 1g.10gb (ID 19)
Successfully created GPU instance ID  6 on GPU  6 using profile MIG 1g.20gb (ID 15)
Successfully created GPU instance ID 11 on GPU  6 using profile MIG 1g.10gb (ID 19)



6-2-2. 다중 GPU MIG GI 생성 확인

nvidia-smi mig -i <GPU ID> -lgi

# nvidia-smi mig -i 2,4,6 -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   2  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   2  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   4  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   6  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+




6-3. 다중 GPU MIG CI 생성




6-3-1. 다중 GPU MIG CI 생성

nvidia-smi mig -i <GPU ID> -cci -gi <GPU instance ID>


# nvidia-smi mig -i 2,4,6 -cci -gi 6,11

Successfully created compute instance ID  0 on GPU  2 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  2 GPU instance ID  6 using profile MIG 1g.
20gb (ID  7)
Successfully created compute instance ID  0 on GPU  4 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  4 GPU instance ID  6 using profile MIG 1g.
20gb (ID  7)
Successfully created compute instance ID  0 on GPU  6 GPU instance ID 11 using profile MIG 1g.
10gb (ID  0)
Successfully created compute instance ID  0 on GPU  6 GPU instance ID  6 using profile MIG 1g.
20gb (ID  7)



6-3-2. 다중 GPU MIG CI 생성 확인

nvidia-smi mig -i <GPU ID> -lci


# nvidia-smi mig -i 2,4,6 -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   2     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2      6       MIG 1g.20gb          7         0          0:2     |
+--------------------------------------------------------------------+
|   4     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4      6       MIG 1g.20gb          7         0          0:2     |
+--------------------------------------------------------------------+
|   6     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6      6       MIG 1g.20gb          7         0          0:2     |
+--------------------------------------------------------------------+




6-4. 다중 GPU MIG GI, CI 생성




6-4-1. 다중 GPU MIG GI, CI 생성

nvidia-smi mig -i <GPU ID> -cgi <GPU instance profiles ID> -C


# nvidia-smi mig -i 2,4,6 -cgi 19 -C

Successfully created GPU instance ID 12 on GPU  2 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  2 GPU instance ID 12 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 12 on GPU  4 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  4 GPU instance ID 12 using profile MIG 1g.
10gb (ID  0)
Successfully created GPU instance ID 12 on GPU  6 using profile MIG 1g.10gb (ID 19)
Successfully created compute instance ID  0 on GPU  6 GPU instance ID 12 using profile MIG 1g.
10gb (ID  0)



6-4-2. 다중 GPU MIG GI, CI 생성 확인

nvidia-smi mig -i <GPU ID> -lgi


# nvidia-smi mig -i 2,4,6 -lgi

+-------------------------------------------------------+
| GPU instance:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   2  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   2  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   2  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   4  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   4  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb         19       11          4:1     |
+-------------------------------------------------------+
|   6  MIG 1g.10gb         19       12          5:1     |
+-------------------------------------------------------+
|   6  MIG 1g.20gb         15        6          6:2     |
+-------------------------------------------------------+


nvidia-smi mig -i <GPU ID> -lci


# nvidia-smi mig -i 2,4,6 -lci

+--------------------------------------------------------------------+
| Compute instance:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   2     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   2      6       MIG 1g.20gb          7         0          0:2     |
+--------------------------------------------------------------------+
|   4     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   4      6       MIG 1g.20gb          7         0          0:2     |
+--------------------------------------------------------------------+
|   6     11       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6     12       MIG 1g.10gb          0         0          0:1     |
+--------------------------------------------------------------------+
|   6      6       MIG 1g.20gb          7         0          0:2     |
+--------------------------------------------------------------------+




6-5. 다중 GPU MIG CI 삭제




6-5-1. 다중 GPU MIG CI 삭제

nvidia-smi mig -i <GPU ID> -dci -gi <GPU instance ID>


# nvidia-smi mig -i 2,4,6 -dci -gi 6,11,12

Successfully destroyed compute instance ID  0 from GPU  2 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  2 GPU instance ID 12
Successfully destroyed compute instance ID  0 from GPU  2 GPU instance ID  6
Successfully destroyed compute instance ID  0 from GPU  4 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  4 GPU instance ID 12
Successfully destroyed compute instance ID  0 from GPU  4 GPU instance ID  6
Successfully destroyed compute instance ID  0 from GPU  6 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  6 GPU instance ID 12
Successfully destroyed compute instance ID  0 from GPU  6 GPU instance ID  6



6-5-2. 다중 GPU MIG CI 삭제 확인

nvidia-smi mig -i <GPU ID> -lci


# nvidia-smi mig -i 2,4,6 -lci

No compute instance found: Not Found




6-6. 다중 GPU MIG GI 삭제




6-6-1. 단일 GPU MIG GI 삭제

nvidia-smi mig -i <GPU ID> -dgi -gi <GPU instance ID>


# nvidia-smi mig -i 2,4,6 -dgi -gi 6,11,12

Successfully destroyed GPU instance ID 11 from GPU  2
Successfully destroyed GPU instance ID 12 from GPU  2
Successfully destroyed GPU instance ID  6 from GPU  2
Successfully destroyed GPU instance ID 11 from GPU  4
Successfully destroyed GPU instance ID 12 from GPU  4
Successfully destroyed GPU instance ID  6 from GPU  4
Successfully destroyed GPU instance ID 11 from GPU  6
Successfully destroyed GPU instance ID 12 from GPU  6
Successfully destroyed GPU instance ID  6 from GPU  6



6-6-2. 다중 GPU MIG GI 삭제 확인

nvidia-smi mig -i <GPU ID> -lgi


# nvidia-smi mig -i 2,4,6 -lgi

No GPU instance found: Not Found




6-7. 다중 GPU MIG 비활성화




6-7-1. 다중 GPU MIG 비활성화 설정

nvidia-smi mig -i <GPU ID> -mig 0


# nvidia-smi mig -i 2,4,6 -mig 0

Disabled MIG Mode for GPU 00000000:3F:00.0

Warning: persistence mode is disabled on device 00000000:3F:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:9B:00.0

Warning: persistence mode is disabled on device 00000000:9B:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
Disabled MIG Mode for GPU 00000000:BF:00.0

Warning: persistence mode is disabled on device 00000000:BF:00.0. See the Known Issues section
 of the nvidia-smi(1) man page for more information. Run with [--help | -h] switch to get more
 information on how to enable persistence mode.
All done.



6-7-2. 다중 GPU MIG 비활성화 설정 확인

nvidia-smi -i <GPU ID>


# nvidia-smi -i 2,4,6

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   2  NVIDIA H100 80GB HBM3          Off |   00000000:3F:00.0 Off |                    0 |
| N/A   41C    P0            157W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          Off |   00000000:9B:00.0 Off |                    0 |
| N/A   38C    P0            157W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          Off |   00000000:BF:00.0 Off |                    0 |
| N/A   41C    P0            163W /  700W |       1MiB /  81559MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+



Leave a Comment