DE Professional Practice Q6

A. Run the full production job against shared bronze tables for every code change

Production jobs and shared bronze tables are for integration or validation, not unit tests.

B. Build small deterministic input DataFrames and assert on the transformed output

PySpark unit tests should isolate the transformation logic with tiny, fixed inputs so the output is predictable and can be compared exactly against expected rows and schema. In practice, that means constructing deterministic DataFrames in the test and asserting the transformed result, rather than relying on live sources, randomness, or cluster state, which would make the test nondeterministic and flaky.

C. Rely only on manual notebook inspection after deployment

Manual notebook review is ad hoc; tests should automatically assert expected outputs.

D. Skip tests because Spark plans can change between runtimes

Changing Spark plans does not eliminate the need to test transformation logic.

Question 6

Explanation

Why each option is right or wrong