Question 11
Domain 2: Core Machine Learning, AI, and Transformer FoundationsWhat is the key innovation of the Transformer architecture over RNNs?
Correct answer: B
Explanation
Transformers replace recurrent, step-by-step computation with self-attention, so they can process all tokens at once instead of one time step at a time. This gives them "parallel processing capability," unlike RNNs, which must handle sequences sequentially.
Why each option is right or wrong
A. Better memory efficiency
B. Parallel processing capability
The defining architectural change is that the Transformer removes recurrence and uses self-attention over the full sequence, so the model does not need to wait for hidden state updates at each time step. In the original Transformer paper (Vaswani et al., 2017, "Attention Is All You Need"), this allows all positions to be computed simultaneously during training and inference, unlike RNNs which are inherently sequential across time steps.
C. Smaller model size
D. Faster inference only