Question 4
Domain 1: Developing Code for Data Processing using Python and SQLA Structured Streaming job restarts after a transient cluster failure and must resume without rereading all previously committed data. What mechanism enables that?
Correct answer: B
Explanation
Structured Streaming uses a durable checkpoint to store progress metadata, including source offsets and state, so a restarted job can resume from the last committed point instead of rereading all data. This follows the checkpointing rule that streaming queries recover from "checkpoint location" information after failures.
Why each option is right or wrong
A. Photon execution mode
Photon speeds query execution; it does not persist streaming progress for recovery.
B. A durable checkpoint location
Apache Spark Structured Streaming recovers a failed query from its checkpoint directory, which persists the query’s offset log, commit log, and state store metadata on durable storage. Under Spark’s Structured Streaming fault-tolerance model, the restarted query reads the last committed offsets from that checkpoint rather than replaying the entire source, so it resumes from the exact recovery point after the transient cluster failure.
C. A global temp view
A global temp view is a shared SQL view, not a fault-tolerance or offset-tracking mechanism.
D. Manual `VACUUM` before restarting
VACUUM removes old files; it does not restore streaming state or resume offsets.