NCA-GENL Practice Q2

A. Increasing model accuracy

B. Reducing memory usage and inference latency

Quantization replaces higher-precision parameters and activations with lower-bit representations, so the model occupies less storage and requires less bandwidth during inference. In practice, moving from 32-bit floating point to 8-bit or similar formats reduces memory footprint and usually shortens inference time because the hardware processes fewer bits per operation.

C. Improving training stability

D. Enhancing model interpretability

Question 2

Explanation

Why each option is right or wrong