NCP-AAI Practice Q3

A. Summarization: ROUGE, faithfulness, compression ratio. QA: Exact match, F1, answer relevance. Creative writing: Diversity, coherence, user preference.

The metric set aligns with the evaluation objective of each task: summarization is typically assessed with ROUGE for n-gram overlap, plus faithfulness to check factual consistency and compression ratio to measure how much the source was condensed. For QA, exact match and F1 are standard reference-based scores, and answer relevance checks whether the response actually addresses the question; for creative writing, diversity, coherence, and user preference are appropriate because there is usually no single gold answer and quality is judged more subjectively.

B. Use accuracy for all tasks.

C. Use BLEU for all tasks.

D. Use user satisfaction for all tasks.

Question 3

Explanation

Why each option is right or wrong