Question 27
Domain 3: Deployment and Orchestration of ML WorkflowsA company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size. An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand. Which solution will meet these requirements with the LEAST operational overhead?
Correct answer: B
Explanation
Amazon SageMaker Asynchronous Inference is designed for large payloads and variable traffic, letting you send requests from S3 and process them without keeping a real-time connection open. Adding a scaling policy lets the endpoint “scale to accommodate changes in demand” with low operational overhead, and a script can submit one inference request per image.
Why each option is right or wrong
A. Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket.
Batch transform is for offline batch processing, not elastic on-demand inference as uploads arrive.
B. Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image.
Amazon SageMaker Asynchronous Inference is the SageMaker feature built for payloads that are too large for synchronous real-time inference and for workloads with variable or spiky demand; it accepts input from Amazon S3 and processes requests asynchronously, which fits images up to 50 MB. The endpoint can be attached to Application Auto Scaling so capacity adjusts automatically as request volume changes, avoiding the operational burden of managing workers or polling infrastructure.
C. Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image.
EKS with Karpenter can scale, but cluster and container operations add more management effort.
D. Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job.
AWS Batch suits scheduled or queued batch jobs, not the simplest managed ML inference endpoint pattern.