NCA-GENL Practice Q15

A. Embedding dimension should always be set to the same value as vocabulary size to maintain a one-to-one mapping between tokens and their vector representations

B. Larger embedding dimensions always lead to better performance without any trade-offs, as more dimensions provide strictly superior representational capacity for all tasks

C. The embedding dimension determines how much information can be encoded in each token representation

In the transformer architecture, the token embedding and hidden-state width are fixed by the model dimension d_model, so each token is represented as a vector of that size throughout the network (e.g., Vaswani et al., 2017, "Attention Is All You Need"). A larger d_model increases representational capacity because each token can carry more features into the attention and feed-forward blocks, but it also raises parameter count and compute, so performance gains depend on having sufficient data and training budget.

D. Embedding dimension only affects the input layer and doesn't impact the rest of the model, since subsequent layers operate on fixed-size internal representations independently

Question 15

Explanation

Why each option is right or wrong