Question 34
Domain 5You have trained a model using data that was preprocessed in a batch Dataflow pipeline, and now you need real-time inference while ensuring consistent data preprocessing between training and serving. What should you do?
Correct answer: B
Explanation
Real-time inference requires the serving path to apply the same preprocessing as training, so the transformation logic should be reusable outside the batch pipeline. Refactoring the code lets you "employ the same code in the endpoint," which preserves consistent feature handling and avoids training-serving skew.
Why each option is right or wrong
A. Perform data validation to ensure that the input data to the pipeline matches the input data format for the endpoint.
B. Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline and employ the same code in the endpoint.
The endpoint must execute the identical preprocessing logic used during training to avoid training-serving skew, which is why the transformation code from the batch Dataflow pipeline needs to be extracted into reusable code and invoked at inference time. In this scenario, the batch pipeline is not itself the serving path, so leaving the transforms embedded only in that pipeline would prevent real-time requests from being normalized the same way as the training data.
C. Refactor the transformation code in the batch data pipeline so that it can be used outside of the pipeline and share this code with the endpoint's end users.
D. Batch the real-time requests using a time window, preprocess the batched requests using the Dataflow pipeline, and then send the preprocessed requests to the endpoint.