Question 25
Domain 6: Evaluation and MonitoringA team keeps changing prompts, model versions, and judge rubrics during experiments. Which practice best preserves reproducibility?
Correct answer: B
Explanation
MLflow runs preserve experiment lineage by recording the exact "prompts, configs, artifacts, and metrics" used in each run. When prompts, model versions, and judge rubrics change, logging them creates a reproducible record of the experiment state so results can be traced and compared later.
Why each option is right or wrong
A. Save only the best final score in a spreadsheet
A final score omits the inputs, settings, and artifacts needed to reproduce the result.
B. Log prompts, configs, artifacts, and metrics in MLflow runs
MLflow’s run tracking is designed to capture the full experiment state under a unique run ID, including parameters/configs, artifacts, and metrics, so each trial can be replayed or compared even when prompts, model versions, and evaluation rubrics change between runs. In practice, that means recording the exact prompt text, model/version identifiers, judge settings, and outputs in the run record rather than relying on ad hoc notes, which is what preserves lineage and reproducibility across iterations.
C. Turn off all tracing to keep experiments clean
Disabling tracing removes useful experiment history and makes comparisons harder, not cleaner.
D. Rely on notebook comments to remember what changed
Notebook comments are informal and incomplete; they do not provide reliable experiment lineage.