AIP-210 Practice Q12

A. Removing the most important features from the dataset before training

B. Randomly zeroing a fraction of activations during training to prevent co-adaptation

During training, a dropout layer samples a Bernoulli mask and sets a specified proportion of hidden-unit outputs to 0 on each forward pass, so different subnetworks are trained on different mini-batches. The dropout rate is the fraction removed (for example, 0.5 means half the activations are suppressed), which breaks reliance on any one feature pathway and reduces co-adaptation; at inference, the full network is used with activations scaled by the keep probability.

C. Reducing the learning rate to make weight updates smaller

D. Pruning the network at inference time to remove unused neurons

Question 12

Explanation

Why each option is right or wrong