ML Associate Practice Q13

A. Store predictions in advance and retrieve them later without calling an endpoint

Realtime inference focuses on low-latency serving through an endpoint for predictions.

B. Deploy the model to a realtime inference endpoint and query that endpoint for predictions

The source material defines this objective as deploying a model for realtime inference and querying it for predictions, with a focus on low-latency serving through an endpoint. Because the team needs low-latency predictions, a model endpoint queried directly for predictions is the deployment method that matches the stated requirement.

C. Export the model only for offline analysis and avoid exposing any query interface

Low-latency serving requires a model endpoint that can be queried for predictions.

D. Use a deployment method designed only for delayed prediction results in batch workflows

The objective addresses realtime inference, not delayed batch-style prediction workflows.

Question 13

Explanation

Why each option is right or wrong