Question 36
Domain 9: Debugging and DeployingA pipeline is slow, and the team wants to add much larger clusters immediately. What should be checked first?
Correct answer: D
Explanation
Before adding larger clusters, check the real bottleneck: “data layout, skew, shuffle, code logic, or cluster capacity.” Scaling hardware only helps if cluster capacity is the limit; if the slowdown comes from skew, shuffle, or inefficient code, bigger clusters will not fix the root cause.
Why each option is right or wrong
A. Whether the dashboard title can be shorter
Dashboard naming affects readability, not pipeline execution speed or resource bottlenecks.
B. Whether all logs can be deleted
Deleting logs may reduce storage use, but it does not diagnose or fix runtime bottlenecks.
C. Whether analysts can manually copy results
Manual copying changes workflow steps, not the underlying performance of the pipeline itself.
D. Whether the bottleneck is data layout, skew, shuffle, code logic, or cluster capacity
In performance tuning, the first diagnostic step is to identify the limiting factor before changing cluster size, because extra compute only helps when the workload is actually capacity-bound. In practice, the slowdown may instead come from data layout, key skew, excessive shuffle, or inefficient code paths, and those issues will not be corrected by simply adding larger clusters. The question’s wording points to checking the root cause first, rather than assuming hardware scale-up will resolve the bottleneck.