Question 15
Content Domain 3: ModelingA machine learning team has already measured model quality on a held-out validation dataset, but now wants to determine whether the new model improves actual user behavior after deployment. Which evaluation approach should the team use next?
Correct answer: C
Explanation
Offline evaluation measures model performance on existing datasets, while online evaluation tests real-world impact in production, often through A/B testing. Use online evaluation when the goal is to measure how a model changes live user outcomes. — Perform offline and online model evaluation (A/B testing). Key Terms: offline evaluation, online evaluation, A/B testing.
Why each option is right or wrong
A. Run another offline evaluation on the same validation dataset
Offline evaluation uses existing data rather than live production behavior.
B. Replace the model for all users and observe overall metrics
A/B testing compares live outcomes between alternatives instead of switching everyone at once.
C. Conduct an online evaluation using an A/B test in production
The team has already completed offline evaluation on a held-out dataset and now wants to measure whether the new model changes actual user behavior after deployment. Online evaluation addresses live production impact, and A/B testing is the named method for comparing the new model against a baseline in that setting.
D. Skip evaluation because the held-out dataset already showed good performance
Offline results do not establish real-world production impact on users.