Question 12
UnclassifiedHow does dropout regularization work during neural network training?
Correct answer: B
Explanation
Dropout regularization works by "randomly zeroing a fraction of activations during training" so the network cannot rely on any single pathway. This prevents co-adaptation, forcing neurons to learn more robust, distributed features and reducing overfitting.
Why each option is right or wrong
A. Removing the most important features from the dataset before training
B. Randomly zeroing a fraction of activations during training to prevent co-adaptation
During training, a dropout layer samples a Bernoulli mask and sets a specified proportion of hidden-unit outputs to 0 on each forward pass, so different subnetworks are trained on different mini-batches. The dropout rate is the fraction removed (for example, 0.5 means half the activations are suppressed), which breaks reliance on any one feature pathway and reduces co-adaptation; at inference, the full network is used with activations scaled by the keep probability.
C. Reducing the learning rate to make weight updates smaller
D. Pruning the network at inference time to remove unused neurons