Question 39
Domain 5Having achieved an impressive AUC ROC value of 99% for training data on a classification problem with time series data, after only a few experiments, it is now time to identify and fix any potential problems. To do this, the next step should be to explore advanced algorithms and invest time in hyperparameter tuning. What other measures can be taken to ensure the model‘s success?
Correct answer: D
Explanation
Time series models can look excellent on training data while still failing on unseen data if information from the future leaks into training. Nested cross-validation helps prevent this by separating model selection from evaluation, so you can "mitigate data leakage" and get a more reliable estimate of performance before tuning further.
Why each option is right or wrong
A. Combat data leakage by eliminating features highly correlated with the target value.
B. Combat model overfitting by employing a simpler algorithm.
C. Combat model overfitting by adjusting the hyperparameters to decrease the AUC ROC value.
D. Mitigate data leakage by utilizing nested cross-validation during model training.
In time-series classification, a 99% AUC ROC on training data after only a few trials is a classic red flag for leakage or overly optimistic validation, because ordinary random splits can let future observations influence the fitted model. Nested cross-validation is the appropriate safeguard: the inner loop is used for hyperparameter selection, while the outer loop provides an unbiased performance estimate, reducing selection bias and preventing information from the test fold from contaminating model choice.