Question 38
UnclassifiedBefore running a large multi-day training job, you want to verify that all NVLink connections between GPUs pass bandwidth and error checks, and that PCIe Gen5 links are running at full width.
Correct answer: A
Explanation
"dcgmi diag" runs NVIDIA DCGM diagnostics, which include GPU interconnect tests for NVLink bandwidth and error checking before long training jobs. It also reports PCIe link status, letting you verify that PCIe Gen5 links are operating at full width.
Why each option is right or wrong
A. dcgmi diag
`dcgmi diag` is the NVIDIA Data Center GPU Manager diagnostic command, and its built-in tests are designed to validate interconnect health before production workloads. In particular, the diagnostic suite includes NVLink bandwidth and error checks, and it reports PCIe link information such as negotiated generation and lane width, so you can confirm Gen5 links are operating at x16 rather than a reduced width.
B. nvidia-bug-report
C. gpu-burn