MLA-C01 Exam Glossary - 184 Terms

Search the terminology pack for AWS Certified Machine Learning Engineer - Associate. Use these definitions with the study guide and practice questions.

Download App Study Guide Free Practice Exam

A

accuracy: A classification metric listed in the text.
Amazon AppFlow: An AWS data ingestion service listed for ML workloads.
Amazon Bedrock foundation models: Foundation models available through Amazon Bedrock, listed as an option alongside built-in, pre-trained, and custom models.
Amazon CloudWatch: An AWS monitoring service used here to track machine learning endpoint metrics such as latency, error rate, invocations, and model latency.
Amazon EBS: An AWS storage option listed for ML workloads.
Amazon EFS: An AWS storage option listed for ML workloads.
Amazon EMR: An AWS service listed for distributed data transformation.
Amazon EventBridge: An AWS event bus service used for event-driven ML triggers.
Amazon Forecast: An AWS AutoML service for forecasting.
Amazon FSx for Lustre: An AWS storage service used for ML workloads, noted for high-throughput training reads.
Amazon Kinesis Data Streams: An AWS data ingestion service for streaming data.
Amazon MSK: An AWS data ingestion service listed for ML workloads.
Amazon Personalize: An AWS AutoML service for recommendations and personalization.
Amazon S3: An AWS storage service used for ML workloads; the text notes Standard, Intelligent-Tiering, and Glacier storage classes.
anomaly detection: A machine learning problem framing focused on identifying unusual or outlier observations.
Apache Spark: A distributed data processing framework used for transformation.
asynchronous: A SageMaker endpoint type that processes inference requests asynchronously.
audit trails: Records of actions and events used to support auditing and traceability.
Auto Scaling: A scaling capability used for inference at scale.
AutoML: Automated machine learning, represented in the text by SageMaker Autopilot, Amazon Forecast, and Amazon Personalize.
Avro: A data format used for ingesting and storing data.
AWS AI Service Cards: AWS documentation artifacts that provide information about AI services and their intended use.
AWS Budgets: An AWS service used to set and monitor cost budgets and track spending.
AWS CDK: The AWS Cloud Development Kit, an Infrastructure as Code tool.
AWS CloudFormation: An AWS Infrastructure as Code service.
AWS Cost Explorer: An AWS service used to analyze and track cloud costs.
AWS Data Migration Service (DMS): An AWS data ingestion service listed for ML workloads.
AWS Glue: An AWS data ingestion and ETL service used for ML data preparation and transformation.
AWS Glue Data Quality: An AWS service for assessing data quality.
AWS Glue DataBrew: A visual, no-code data preparation tool that uses recipes for data prep.
AWS Glue ETL: An AWS Glue-based extract-transform-load capability used for data transformation.
AWS IoT Greengrass: An AWS service used for edge deployment.
AWS Lambda: An AWS service used for lightweight data transforms.

B

batch ingestion: A data ingestion pattern that ingests data in discrete batches, contrasted in the text with streaming ingestion.
batch normalization: A model training technique listed in the text.
batch transform: A SageMaker inference mode that processes data in batches rather than through an always-on endpoint.
Bayesian: A hyperparameter tuning strategy listed in the text.
BERT: A model architecture listed in the text for model selection.
bias detection: The process of identifying bias in datasets; the text specifically mentions pre-training bias metrics.
bias drift: A change over time in the bias characteristics of a model or its outputs.
bias-variance tradeoff: The tradeoff between model bias and variance used to reason about overfitting and underfitting.
BLEU: An NLP evaluation metric listed in the text.
blue/green deployment: A deployment strategy that shifts traffic between two environments to reduce downtime and risk during releases.

C

canary deployment: A deployment strategy that releases a model to a small subset of traffic first to validate behavior before broader rollout.
CI/CD: An abbreviation for continuous integration and continuous delivery, referring to automated build, test, and deployment pipelines.
class weighting: A class-imbalance technique that assigns different weights to classes during model training.
classification: A machine learning problem framing where the model predicts discrete classes.
CloudTrail: An AWS service that records API activity for auditing and compliance.
CloudWatch alarms: Configured alerts in Amazon CloudWatch that trigger when monitored metrics cross defined thresholds.
CloudWatch dashboards: Custom visual displays in Amazon CloudWatch used to track metrics and KPIs.
clustering: A machine learning problem framing where the model groups similar data points.
CodeBuild: An AWS service used with SageMaker Projects to build components in machine learning CI/CD pipelines.
CodePipeline: An AWS service used with SageMaker Projects to orchestrate machine learning CI/CD pipelines.
concept drift: A change in the relationship between input data and the target outcome over time.
confusion matrix: A table used to evaluate classification performance by comparing predicted and actual classes.
continuous delivery: A software delivery practice where validated changes are automatically prepared for release through a pipeline, enabling frequent deployment to production or other environments.
continuous integration: A software delivery practice where code changes are frequently merged and validated through automated builds and tests as part of a pipeline.
cost allocation: The process of assigning cloud costs to specific workloads, teams, or resources.
cross-validation: A model evaluation technique that repeatedly trains and validates on different data splits.
CSV: A comma-separated data format used for ingesting and storing tabular data.

D

data drift: A change in the distribution of input data over time.
data quality: The quality of data as measured by completeness, consistency, accuracy, and deduplication.
data quality drift: A change in the statistical properties or quality of input data over time.
DeepAR: An algorithm listed in the text for model selection.
distributed training: A training approach that spreads model training across multiple resources.
dropout: A regularization technique listed in the text.

E

early stopping: A training control technique listed in the text as a regularization method.
ECS: An AWS container orchestration service listed for custom inference.
EKS: An AWS container orchestration service listed for custom inference.
embeddings: Derived representations used as features during feature engineering.
error rate: The proportion of requests that result in errors.
execution roles: IAM roles used by SageMaker to perform actions on behalf of a job or service.

F

F1: A classification metric listed in the text.
Fargate: An AWS compute option listed for custom inference container orchestration.
feature attribution drift: A change over time in how much individual features contribute to model predictions.
feature engineering: The process of creating or transforming input variables, including aggregations, time-window features, embeddings, and derived features.
file mode: A SageMaker training data input mode listed in the text.
fine-tuning: A training approach that adapts a pre-trained model to a specific task or dataset.
forecasting: A machine learning problem framing for predicting future values or trends.

G

Glacier: An Amazon S3 storage class listed as an option for ML workloads.
Graviton: An AWS instance type listed for inference at scale.
grid: A hyperparameter tuning strategy listed in the text.

H

Hyperband: A hyperparameter tuning strategy listed in the text.
hyperparameter tuning: The process of searching for the best hyperparameter values for a model.

I

IAM: An acronym for AWS Identity and Access Management, used to control access to AWS resources.
imputation: A data preparation technique for filling in missing values.
inference endpoint: A deployed endpoint that serves model predictions for inference requests.
inference pipelines: A SageMaker deployment pattern that chains multiple containers or steps for inference.
Inferentia: An AWS instance type listed for inference at scale.
Infrastructure as Code: A way to define and manage infrastructure programmatically rather than manually.
instance types: The specific compute configurations used for AWS workloads.
Intelligent-Tiering: An Amazon S3 storage class that is listed as an option for ML workloads.
invocations: The number of times an ML endpoint is called.

J

JSON: A structured data format used for ingesting and storing data.

K

k-fold: A cross-validation method that splits data into k folds and rotates the validation fold.
k-means: A clustering algorithm listed in the text.
Kinesis Data Firehose: An AWS data ingestion service for delivering streaming data.
KMS: An acronym for AWS Key Management Service, used for encryption at rest.
KPI: An acronym for key performance indicator, a metric used to measure performance.

L

L1: A regularization method listed in the text.
L2: A regularization method listed in the text.
latency: The time taken for an ML endpoint to return a response.
learning curves: Plots used to assess model performance as training data size or training progress changes.
least-privilege policies: Access policies that grant only the minimum permissions required to perform a task.
linear regression: A regression algorithm listed in the text.
logistic regression: A classification algorithm listed in the text.

M

MAE: A regression metric listed in the text.
MAP: A ranking metric listed in the text.
Mechanical Turk: A data labeling service mentioned as an option for labeling data.
model approval workflows: Approval processes used to gate model promotion before deployment.
model latency: The time taken by the model portion of an inference request to produce a prediction.
model lineage: The traceable history of a model, including its origins, changes, and related artifacts.
model quality drift: A decline or change in model performance over time as monitored in production.
model registry: A repository for storing and managing approved machine learning models and their metadata.
model versioning: The practice of tracking and managing different versions of a machine learning model.
multi-container endpoints: SageMaker endpoints that run multiple containers for inference.
multi-model endpoints: SageMaker endpoints that host multiple models on a single endpoint.

N

NDCG: A ranking metric listed in the text.
neural networks: A class of machine learning models listed in the text.

O

one-hot encoding: A data encoding technique that represents categorical values as binary vectors.
ORC: A data format used for ingesting and storing data.
ordinal encoding: A data encoding technique that maps categories to an ordered numeric representation.
overfitting: A modeling problem where a model fits training data too closely and generalizes poorly.
oversampling: A class-imbalance technique that increases the representation of minority classes.

P

Parquet: A columnar data format used for ingesting and storing data, especially in ML workflows.
pipe mode: A SageMaker training data input mode listed in the text.
PR AUC: A classification metric listed in the text.
precision: A classification metric listed in the text.
prompt engineering: The practice of crafting prompts for foundation models.

R

R²: A regression metric listed in the text.
random: A hyperparameter tuning strategy listed in the text.
RCF: An algorithm listed in the text for model selection.
real-time: A SageMaker endpoint type that serves inference synchronously.
recall: A classification metric listed in the text.
recommendation: A machine learning problem framing for suggesting items or content to users.
RecordIO: A data format used for ingesting and storing data.
regression: A machine learning problem framing where the model predicts continuous values.
regularization: A set of techniques used to reduce overfitting, including L1, L2, dropout, and early stopping.
residual plots: Plots used to analyze prediction errors, especially for regression.
resource tagging: The practice of assigning tags to resources so costs and usage can be tracked and allocated.
right-sizing: The practice of selecting appropriately sized instance types to match workload needs and reduce cost.
RMSE: A regression metric listed in the text.
ROC AUC: A classification metric listed in the text.
role chaining: A pattern in which one IAM role assumes another role to access additional permissions.
ROUGE: An NLP evaluation metric listed in the text.

S

SageMaker Automatic Model Tuning: An AWS SageMaker service for hyperparameter tuning.
SageMaker Autopilot: An AWS AutoML service.
SageMaker built-in algorithms: Predefined SageMaker algorithms used instead of custom models or pre-trained models.
SageMaker Clarify: An AWS SageMaker tool used for pre-training and post-training bias metrics, and for explainability with SHAP.
SageMaker Data Wrangler: An AWS SageMaker tool used for data preparation and transformation.
SageMaker Edge Manager: An AWS SageMaker service for edge deployment.
SageMaker endpoint: A SageMaker deployment target for model inference.
SageMaker Experiments: An AWS SageMaker service for tracking metrics across training runs.
SageMaker Ground Truth: An AWS SageMaker data labeling service.
SageMaker Ground Truth Plus: An AWS SageMaker data labeling service.
SageMaker Inference Recommender: An AWS SageMaker tool used to recommend optimal endpoint sizing for inference workloads.
SageMaker JumpStart: A SageMaker capability for using pre-trained models.
SageMaker Model Cards: Documentation artifacts for SageMaker models that describe model details and usage information.
SageMaker Model Monitor: An AWS SageMaker feature used to monitor model performance and data quality, including drift detection.
SageMaker Pipelines: An AWS SageMaker workflow service for ML pipelines such as preprocessing, training, evaluation, and deployment.
SageMaker Projects: An AWS SageMaker capability used to set up machine learning CI/CD workflows, including integration with CodePipeline and CodeBuild.
SageMaker training jobs: SageMaker jobs used to train models; the text mentions spot training, distributed training, pipe mode, and file mode.
savings plans: An AWS pricing model used to reduce compute costs through committed usage.
Secrets Manager: An AWS service used to store and manage credentials and other secrets.
serverless: A SageMaker endpoint type that runs inference without managing servers.
shadow deployment: A safe rollout pattern where a new model receives mirrored traffic for evaluation without affecting live responses.
SHAP: A model explainability method used by SageMaker Clarify.
SMOTE: A technique used to address class imbalance by generating synthetic minority examples.
spot training: A cost optimization approach that uses spare AWS capacity for training jobs at lower cost.
Step Functions: An AWS orchestration service used for cross-service ML workflows.
stratified sampling: A sampling method that preserves class proportions when splitting data.
streaming ingestion: A data ingestion pattern that continuously ingests data as it is produced, contrasted in the text with batch ingestion.

T

target encoding: A data encoding technique listed with other encoding methods for data transformation.
time-based splits: A data splitting method that partitions data according to time order.
time-series CV: A cross-validation method designed for time-ordered data.
TLS: An acronym for Transport Layer Security, used for encryption in transit.
train/validation/test: A data splitting scheme that divides data into training, validation, and test sets.
Trainium: An AWS instance type listed for inference at scale.
transfer learning: A training approach that reuses knowledge from one model or task for another.

U

underfitting: A modeling problem where a model is too simple to capture the underlying pattern in the data.
undersampling: A class-imbalance technique that reduces the representation of majority classes.

V

VPC endpoints: Private network endpoints used to access SageMaker without traversing the public internet.
VPC isolation: The practice of running training and inference inside a virtual private cloud to isolate resources from broader network access.

X

XGBoost: An algorithm listed in the text for model selection.

About These Definitions

These definitions are loaded from the shared release pack. Use them with the study guide and practice questions to connect vocabulary to exam scenarios.

Download App Read the full study guide Take the free practice exam