Question 19
Domain 3: Knowledge Integration, Data Handling, Cognition, Planning, and MemoryWhat ETL architecture handles multiple sources?
Correct answer: A
Explanation
A modular ETL design handles multiple sources by separating “source-specific extractors” from a shared “common transformation layer,” so each source can be ingested without rewriting the whole pipeline. Using an orchestration tool like Airflow for “scheduling and error handling” supports reliable coordination across many inputs and loads into the vector database.
Why each option is right or wrong
A. Build modular ETL pipeline: source-specific extractors → common transformation layer → vector database loader. Use orchestration tool (Airflow) for scheduling and error handling.
The architecture is correct because multiple heterogeneous inputs are best handled by isolating each source in its own extractor, then normalizing them through a shared transformation stage before loading into the vector store; that separation prevents source-specific logic from contaminating downstream processing. In practice, an orchestrator such as Apache Airflow is used to schedule the DAG, manage dependencies, and retry failed tasks, which is the standard way to coordinate many ingestion paths and preserve reliability when one source fails without stopping the entire ETL run.
B. Write custom scripts for each source, run manually.
C. Use single connector that works with all sources.
D. Export all data to CSV, process manually.