Question 33
Content Domain 3: ModelingA web-based company wants to improve its conversion rate on its landing page. Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker. However, there is an overfitting problem: training data shows 90% accuracy in predictions, while test data shows 70% accuracy only. The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases. Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?
Correct answer: C
Explanation
Overfitting is shown by the gap between "90% accuracy" on training data and "70% accuracy" on test data, so the model needs regularization to improve generalization. L1/L2 regularization penalize large weights and dropout randomly disables neurons during training, both reducing overfitting and helping produce the highest accuracy on validation and test data.
Why each option is right or wrong
A. Increase the randomization of training data in the mini-batches used in training.
Mini-batch shuffling can help optimization, but alone usually does not fix strong overfitting.
B. Allocate a higher proportion of the overall data to the training dataset.
Using more data for training leaves less for validation/testing and does not directly reduce overfitting.
C. Apply L1 or L2 regularization and dropouts to the training.
Amazon SageMaker deep learning models that show a 20-point train/test gap are exhibiting classic overfitting, so the remedy is to constrain model complexity during training rather than further optimize the training fit. L1/L2 regularization adds a penalty term to the loss function, and dropout randomly deactivates units during training, both of which reduce variance and improve performance on held-out data; in this scenario that is the appropriate path to better validation/test accuracy before production deployment.
D. Reduce the number of layers and units (or neurons) from the deep learning network.
Smaller networks can reduce capacity, but this is a blunt change versus targeted regularization.