Question 29
Domain 2: Data PreparationWhere should a Databricks team usually store PDFs and other unstructured source files when building a governed RAG pipeline?
Correct answer: C
Explanation
Unity Catalog volumes are the governed storage layer for unstructured files such as PDFs in Databricks. They let a team store source documents in a managed location with access controls and lineage, which fits a governed RAG pipeline better than ad hoc cloud storage.
Why each option is right or wrong
A. DBFS root
DBFS root is legacy-style workspace storage, not the usual governed location for shared source documents.
B. A notebook output folder
Notebook output folders hold generated results, not durable governed repositories for source PDFs.
C. Unity Catalog volumes
Unity Catalog volumes are the governed file-storage object in Databricks for unstructured assets, so PDFs and similar source documents belong there rather than in an unmanaged cloud bucket. Under the Unity Catalog model, volumes sit inside a catalog and schema and inherit centralized permissions and auditability, which is the expected storage pattern for a governed RAG pipeline.
D. Cluster local disk
Cluster local disk is ephemeral compute storage and can disappear when clusters restart or terminate.