Question 2
Domain 3: NVIDIA Tools, Performance, and DeploymentWhat is the main benefit of model quantization?
Correct answer: B
Explanation
Model quantization lowers the precision of weights and activations, so the model stores and moves fewer bits. That reduces memory usage and often speeds up computation, which lowers inference latency.
Why each option is right or wrong
A. Increasing model accuracy
B. Reducing memory usage and inference latency
Quantization replaces higher-precision parameters and activations with lower-bit representations, so the model occupies less storage and requires less bandwidth during inference. In practice, moving from 32-bit floating point to 8-bit or similar formats reduces memory footprint and usually shortens inference time because the hardware processes fewer bits per operation.
C. Improving training stability
D. Enhancing model interpretability