Question 15
Domain 2: Core Machine Learning, AI, and Transformer FoundationsIn transformer models, how does the embedding dimension (d_model) affect model capacity and performance?
Correct answer: C
Explanation
The embedding dimension, or d_model, sets the size of each token vector, so it determines how much information can be encoded in each token representation. A larger d_model increases model capacity by allowing richer features and relationships to be stored, which can improve performance when enough data and compute are available.
Why each option is right or wrong
A. Embedding dimension should always be set to the same value as vocabulary size to maintain a one-to-one mapping between tokens and their vector representations
B. Larger embedding dimensions always lead to better performance without any trade-offs, as more dimensions provide strictly superior representational capacity for all tasks
C. The embedding dimension determines how much information can be encoded in each token representation
In the transformer architecture, the token embedding and hidden-state width are fixed by the model dimension d_model, so each token is represented as a vector of that size throughout the network (e.g., Vaswani et al., 2017, "Attention Is All You Need"). A larger d_model increases representational capacity because each token can carry more features into the attention and feed-forward blocks, but it also raises parameter count and compute, so performance gains depend on having sufficient data and training budget.
D. Embedding dimension only affects the input layer and doesn't impact the rest of the model, since subsequent layers operate on fixed-size internal representations independently