Question 40
Domain 4: Implement Natural Language Processing SolutionsA company builds a multilingual custom translation model for technical manuals. After training on 5,000 sentence pairs, the BLEU score is 28 (target: 45+). Training data is English to French. What is the most effective improvement?
Correct answer: B
Explanation
BLEU improves when a model has more aligned parallel data to learn translation patterns, especially in a specialized domain like technical manuals. Expanding from "5,000 sentence pairs" to "50,000+" and making sure "domain-specific terminology is well-represented" gives the model more examples of the exact language it must translate, which is the most effective way to raise score.
Why each option is right or wrong
A. Switch to a different Azure region for better translation performance
B. Add more high-quality parallel sentence pairs (aim for 50,000+) and ensure domain-specific terminology is well-represented
BLEU is an evaluation metric for machine translation, and a score of 28 on only 5,000 aligned English–French sentence pairs indicates the model is undertrained for the target domain. In practice, the most effective fix is to increase the size and quality of the parallel corpus substantially—on the order of tens of thousands of sentence pairs, here 50,000+—because the model needs more aligned examples to learn technical phrasing and terminology consistently. For a custom translation system, domain coverage matters as much as volume, so ensuring manuals’ specialized terms are well represented directly addresses the weak BLEU performance.
C. Enable document translation mode instead of text translation
D. Reduce the training data to 1,000 highest-quality pairs