Question 6
Domain 1: Design and prepare a machine learning solutionYour team is building a data engineering and data science development environment. The environment must support the following requirements: - support Python and Scala - compose data storage, movement, and processing services into automated data pipelines - the same tool should be used for the orchestration of both data engineering and data science - support workload isolation and interactive workloads - enable scaling across a cluster of machines You need to create the environment. What should you do?
Correct answer: A
Explanation
Azure Databricks supports both Python and Scala, and its clusters provide workload isolation, interactive notebooks, and scaling across machines. Azure Data Factory is designed to "compose data storage, movement, and processing services into automated data pipelines," making it the orchestration layer for both data engineering and data science workflows.
Why each option is right or wrong
A. Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
Azure Databricks is the service that satisfies the runtime requirements here: it natively supports both Python and Scala, runs interactive notebook-based workloads, isolates jobs by cluster, and scales horizontally across a Spark cluster. For the orchestration requirement, Azure Data Factory is the correct control plane because it is built to compose storage, movement, and processing activities into automated pipelines, with pipeline orchestration handled through ADF activities and triggers rather than inside the compute layer itself.
B. Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
C. Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.
D. Build the environment in Azure Databricks and use Azure Container Instances for orchestration.