Question 16
Domain 3: Data Transformation, Cleansing, and QualityIn Spark SQL, a join must treat two NULL business-key values as equal. Which operator should you use in the join condition?
Correct answer: C
Explanation
Spark SQL uses the null-safe equality operator "<=>" when NULLs must match in a join condition. Unlike "=", it treats two NULL values as equal, so rows with NULL business keys can join to each other.
Why each option is right or wrong
A. IS NOT DISTINCT FROM
`IS NOT DISTINCT FROM` is SQL-style syntax, but Spark join conditions use `<=>` for null-safe equality.
B. =
`=` does not treat `NULL = NULL` as true, so null business keys will not join.
C. <=>
Spark SQL’s null-safe equality operator is the `<=>` comparison defined in the SQL expression grammar, and it is the only equality test that returns true when both operands are NULL. Under normal `=` semantics, `NULL = NULL` evaluates to unknown, so a join on business keys containing NULLs would not match those rows; using `<=>` makes the join condition true for paired NULL values and still compares non-NULL keys normally.
D. IS DISTINCT FROM
`IS DISTINCT FROM` tests inequality, so it does the opposite of matching equal nulls.