DP-100 Exam Prep

Study Guide

Designing and Implementing a Data Science Solution on Azure Study Guide

Use the saved domain outline to connect design and prepare a machine learning solution, explore data and run experiments, train and evaluate models, deploy and operationalize machine learning solutions to scenario-based questions and explanations.

How the Exam Is Structured

Designing and Implementing a Data Science Solution on Azure (DP-100) validates design and prepare a machine learning solution, explore data and run experiments, train and evaluate models, deploy and operationalize machine learning solutions. The ExamPal practice bank includes 175 premium questions and 40 free questions mapped across the official blueprint.

DomainWeightFocus
Domain 1: Design and prepare a machine learning solution 20% Task 1.1: Design an Azure Machine Learning workspace solution; Select workspace architecture
Domain 2: Explore data and run experiments 25% Task 2.1: Ingest and profile data; Load data into tools
Domain 3: Train and evaluate models 20% Task 3.1: Select evaluation metrics for model type; Use classification metrics
Domain 4: Deploy and operationalize machine learning solutions 20% Task 4.1: Prepare models for deployment; Register models and dependencies
Domain 5: Monitor, retrain, and manage ML lifecycle 15% Task 5.1: Monitor deployed models and endpoints; Track service performance

20% of exam

Domain 1: Design and prepare a machine learning solution

Covers the foundational Azure Machine Learning workspace, security, compute, environment, and data setup needed to build ML solutions. This domain emphasizes selecting the right workspace architecture and resources, managing access and governance, and preparing reusable development assets for experiments and pipelines.

Task 1.1: Design an Azure Machine Learning workspace solution
Select workspace architecture
Plan supporting Azure resources
Choose implementation interface
Task 1.2: Configure security, access, and governance
Configure role-based access control
Manage secrets and keys securely

25% of exam

Domain 2: Explore data and run experiments

Covers data ingestion, preparation, splitting, training, tuning, and experiment tracking. This domain focuses on the practical workflow of preparing data, running models, and comparing results in Azure Machine Learning.

Task 2.1: Ingest and profile data
Load data into tools
Examine schema and statistics
Identify data quality issues
Task 2.2: Prepare and transform data for modeling
Clean missing or invalid values
Encode categorical variables

20% of exam

Domain 3: Train and evaluate models

Covers selecting evaluation metrics, diagnosing model fit issues, interpreting model behavior, improving performance, and assessing responsible AI considerations. This domain focuses on evaluating model quality and trustworthiness before deployment.

Task 3.1: Select evaluation metrics for model type
Use classification metrics
Use regression metrics
Use clustering metrics
Align metrics with business goals
Task 3.2: Diagnose underfitting and overfitting
Compare training and validation results

20% of exam

Domain 4: Deploy and operationalize machine learning solutions

Covers preparing models for deployment, serving real-time and batch inference, managing inference environments, and integrating deployed models with applications. This domain emphasizes operational readiness, endpoint configuration, and deployment lifecycle management.

Task 4.1: Prepare models for deployment
Register models and dependencies
Create scoring scripts
Package inference assets
Task 4.2: Deploy real-time inference endpoints
Deploy to online or Kubernetes targets
Select deployment settings

15% of exam

Domain 5: Monitor, retrain, and manage ML lifecycle

Covers monitoring deployed services, detecting drift and degradation, automating retraining, managing versioned assets, and supporting collaboration practices. This domain focuses on sustaining ML solutions in production with governance, reproducibility, and MLOps discipline.

Task 5.1: Monitor deployed models and endpoints
Track service performance
Collect logs and diagnostics
Emit custom metrics
Task 5.2: Detect data drift and model degradation
Monitor incoming data drift
Compare production and baseline data

Key Terms to Know

These terms are loaded from the shared terminology pack and appear across the question explanations.

Azure ML Python SDK v2
The version 2 Python software development kit used to interact programmatically with Azure Machine Learning resources.
Azure ML component
A reusable, versionable unit of work in Azure ML pipelines that encapsulates code, environment, inputs, and outputs.
Azure ML workspace
The central Azure Machine Learning resource that stores assets, runs, compute targets, and configuration for ML projects.
Azure Machine Learning Designer
A visual interface in Azure ML used to build, configure, and run machine learning pipelines without extensive coding.
Conda configuration file
A YAML file that specifies Conda packages and dependencies required for an ML environment.
Environment
An Azure ML SDK v2 class used to define the software environment, including dependencies, for training or inference.
Import Data
A Designer component used to bring external data, such as a CSV file from a website, into a pipeline.
MLOps
A set of practices for automating, managing, deploying, monitoring, and retraining machine learning systems.
MLTable
An Azure Machine Learning data asset format used to define tabular or file-based datasets for ML workflows.
MLflow
An open-source platform for tracking experiments, logging metrics and artifacts, and managing ML lifecycle tasks.
ParallelRunStep
An Azure ML pipeline step used for scalable parallel batch inference over large datasets.
YAML
A human-readable configuration format commonly used to define Azure ML components and pipeline settings.
artifact
A file or folder produced or used by an ML run, such as images, models, or output datasets.
autoscaling
The ability of a compute resource to automatically increase or decrease the number of nodes based on workload demand.
binary classification
A supervised learning task in which a model predicts one of two possible classes.
compute cluster
An Azure ML compute target made up of multiple nodes that can run training or batch workloads.
conda_file
A parameter used when creating an Azure ML environment from a Conda YAML specification.
config.json
A workspace configuration file that stores connection details needed for SDK code to connect to Azure Machine Learning.

Official Materials and Guidance

This page is built from Microsoft official materials and ExamPal shared release pack, the shared syllabus, topic tree, terminology pack, free pack, and premium pack.

  • -Guidance: Microsoft Learn study guide, practice assessment, sandbox
  • -Domain outline: Design/prepare ML solution 20-25%; Explore data/run experiments 20-25%; Train/deploy models 25-30%; Optimize language models for AI apps 25-30%.