Question 11
Domain 2: Data PreparationAfter changing the response generating LLM in a RAG pipeline from GPT-4 to a model with a shorter context length that the company self-hosts, the Generative AI Engineer is getting the following error: What TWO solutions should the Generative AI Engineer implement without changing the response generating model? (Choose two.)
Correct answer: C
Explanation
A shorter context window means the RAG prompt can exceed the model’s input limit, so the engineer must reduce what is sent to the generator. Typical fixes are to “truncate” or “summarize” retrieved context and to “reduce the number of retrieved chunks” so the prompt fits the self-hosted model’s context length.
Why each option is right or wrong
A. Decrease the chunk size of embedded documents
Smaller document chunks reduce tokens per retrieved passage, but this is only one of the needed fixes.
B. Reduce the number of records retrieved from the vector database
Fewer retrieved records lowers total prompt size, but alone it does not capture all valid remedies.
C. All of the above
Each of the listed options is a valid answer; all are needed.