DP-100 Exam Prep

DP-100 Exam Glossary - 40 Terms

Search the terminology pack for Designing and Implementing a Data Science Solution on Azure. Use these definitions with the study guide and practice questions.

A

artifact
A file or folder produced or used by an ML run, such as images, models, or output datasets.
autoscaling
The ability of a compute resource to automatically increase or decrease the number of nodes based on workload demand.
Azure Machine Learning Designer
A visual interface in Azure ML used to build, configure, and run machine learning pipelines without extensive coding.
Azure ML component
A reusable, versionable unit of work in Azure ML pipelines that encapsulates code, environment, inputs, and outputs.
Azure ML Python SDK v2
The version 2 Python software development kit used to interact programmatically with Azure Machine Learning resources.
Azure ML workspace
The central Azure Machine Learning resource that stores assets, runs, compute targets, and configuration for ML projects.

B

binary classification
A supervised learning task in which a model predicts one of two possible classes.

C

compute cluster
An Azure ML compute target made up of multiple nodes that can run training or batch workloads.
Conda configuration file
A YAML file that specifies Conda packages and dependencies required for an ML environment.
conda_file
A parameter used when creating an Azure ML environment from a Conda YAML specification.
config.json
A workspace configuration file that stores connection details needed for SDK code to connect to Azure Machine Learning.

D

data leakage
The unintended use of information from evaluation or future data during training, leading to overly optimistic performance.
differential privacy
A privacy-preserving technique that adds statistical noise to outputs to protect individual records in a dataset.

E

Environment
An Azure ML SDK v2 class used to define the software environment, including dependencies, for training or inference.

F

fairness
The principle of ensuring that model outcomes and performance do not systematically disadvantage protected groups.

G

generalization
A model’s ability to perform well on unseen data rather than only on data used during training.

H

hold-out set
A portion of data kept separate from training and used only for evaluation to measure how well a model generalizes.

I

image classification
A machine learning task in which a model predicts the category of an input image.
Import Data
A Designer component used to bring external data, such as a CSV file from a website, into a pipeline.
init()
A required function in a ParallelRunStep entry script used to initialize resources before processing begins.
input tokens
The individual text units, such as words or subwords, that are processed by a language model.

M

machine translation
A natural language processing task where a model translates text from one language into another.
ml_client.data.get
A Python SDK v2 method used to retrieve a registered data asset from an Azure ML workspace by name.
MLflow
An open-source platform for tracking experiments, logging metrics and artifacts, and managing ML lifecycle tasks.
mlflow.log_artifact
An MLflow function used to log a file or directory, such as a folder of images, as an experiment artifact.
mlflow.log_dict
An MLflow function used to log structured dictionary data, such as RGB values, during an experiment.
mlflow.log_metric
An MLflow function used to record numeric metrics, including custom telemetry values, during a run.
MLOps
A set of practices for automating, managing, deploying, monitoring, and retraining machine learning systems.
MLTable
An Azure Machine Learning data asset format used to define tabular or file-based datasets for ML workflows.

N

node
An individual machine or compute instance within a cluster.

P

ParallelRunStep
An Azure ML pipeline step used for scalable parallel batch inference over large datasets.
performance metrics
Quantitative measures used to assess how well a machine learning model performs.
pipeline
A sequence of connected ML workflow steps used to automate data preparation, training, evaluation, or deployment.
protected groups
Population groups defined by sensitive attributes such as ethnicity or gender that are monitored for fairness.

R

registered data asset
A dataset or other data resource stored and versioned in an Azure ML workspace for reuse.
run()
A required function in a ParallelRunStep entry script used to process input data and return results.

S

selection rate
A fairness metric that measures how often a model assigns a favorable outcome to members of a group.

T

telemetry
Operational or custom logging data collected from ML systems to monitor behavior and performance.

V

versioning
The practice of tracking changes to assets like components, data, or models so specific versions can be reused reliably.

Y

YAML
A human-readable configuration format commonly used to define Azure ML components and pipeline settings.

About These Definitions

These definitions are loaded from the shared release pack. Use them with the study guide and practice questions to connect vocabulary to exam scenarios.