Question 3
Domain 1 — AI Governance and Risk Management"Data leakage" in model development refers to:
Correct answer: B
Explanation
Data leakage means information from the future or test set enters training, so the model learns patterns it would not have at prediction time. This “contaminating the training process” makes evaluation look better than real-world performance, which is why it “artificially inflat[es] model performance.”
Why each option is right or wrong
A. Training data being exposed to unauthorized users through a security breach
B. Future or test-set information contaminating the training process, artificially inflating model performance
In model development, leakage occurs when information that would not be available at prediction time is used during fitting or feature engineering, so the evaluation is no longer a valid estimate of out-of-sample performance. In this question, the contaminating information comes from the future or the held-out test set, which lets the model indirectly learn the answer and therefore produces an unrealistically high score on validation or test metrics.
C. Loss of training data due to pipeline infrastructure failures
D. A regulatory breach of data handling and retention procedures