Question 7
Domain 1: Data Preparation for Machine Learning (ML)A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine. The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data. Which solution will meet these requirements with the LEAST operational overhead?
Correct answer: C
Explanation
Amazon Macie is designed to "identify and protect sensitive data" in Amazon S3, so it can find the sensitive information with minimal management. Using AWS Lambda to remove the data automates remediation without standing up and maintaining a separate processing platform, meeting the least operational overhead requirement.
Why each option is right or wrong
A. Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.
SageMaker hosts ML workloads, but Lambda-based detection recreates functionality Macie already provides for S3.
B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.
ECS with Fargate and AWS Batch adds managed compute layers, increasing operational complexity for a data-discovery task.
C. Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.
Amazon Macie is the managed service in AWS that discovers and classifies sensitive data in Amazon S3, so it directly addresses the identification requirement without building a custom scanner or ETL pipeline. Under the AWS Lambda pricing model, you can trigger functions on S3 events to remediate objects automatically, avoiding servers and reducing operational burden compared with running a separate data-processing stack.
D. Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.
Comprehend analyzes text, but EC2 instances require more infrastructure management than a managed S3-sensitive-data service.