MLA-C01 Exam Prep

MLA-C01 Exam Glossary - 184 Terms

Search the terminology pack for AWS Certified Machine Learning Engineer - Associate. Use these definitions with the study guide and practice questions.

A

accuracy
A classification metric listed in the text.
Amazon AppFlow
An AWS data ingestion service listed for ML workloads.
Amazon Bedrock foundation models
Foundation models available through Amazon Bedrock, listed as an option alongside built-in, pre-trained, and custom models.
Amazon CloudWatch
An AWS monitoring service used here to track machine learning endpoint metrics such as latency, error rate, invocations, and model latency.
Amazon EBS
An AWS storage option listed for ML workloads.
Amazon EFS
An AWS storage option listed for ML workloads.
Amazon EMR
An AWS service listed for distributed data transformation.
Amazon EventBridge
An AWS event bus service used for event-driven ML triggers.
Amazon Forecast
An AWS AutoML service for forecasting.
Amazon FSx for Lustre
An AWS storage service used for ML workloads, noted for high-throughput training reads.
Amazon Kinesis Data Streams
An AWS data ingestion service for streaming data.
Amazon MSK
An AWS data ingestion service listed for ML workloads.
Amazon Personalize
An AWS AutoML service for recommendations and personalization.
Amazon S3
An AWS storage service used for ML workloads; the text notes Standard, Intelligent-Tiering, and Glacier storage classes.
anomaly detection
A machine learning problem framing focused on identifying unusual or outlier observations.
Apache Spark
A distributed data processing framework used for transformation.
asynchronous
A SageMaker endpoint type that processes inference requests asynchronously.
audit trails
Records of actions and events used to support auditing and traceability.
Auto Scaling
A scaling capability used for inference at scale.
AutoML
Automated machine learning, represented in the text by SageMaker Autopilot, Amazon Forecast, and Amazon Personalize.
Avro
A data format used for ingesting and storing data.
AWS AI Service Cards
AWS documentation artifacts that provide information about AI services and their intended use.
AWS Budgets
An AWS service used to set and monitor cost budgets and track spending.
AWS CDK
The AWS Cloud Development Kit, an Infrastructure as Code tool.
AWS CloudFormation
An AWS Infrastructure as Code service.
AWS Cost Explorer
An AWS service used to analyze and track cloud costs.
AWS Data Migration Service (DMS)
An AWS data ingestion service listed for ML workloads.
AWS Glue
An AWS data ingestion and ETL service used for ML data preparation and transformation.
AWS Glue Data Quality
An AWS service for assessing data quality.
AWS Glue DataBrew
A visual, no-code data preparation tool that uses recipes for data prep.
AWS Glue ETL
An AWS Glue-based extract-transform-load capability used for data transformation.
AWS IoT Greengrass
An AWS service used for edge deployment.
AWS Lambda
An AWS service used for lightweight data transforms.

B

batch ingestion
A data ingestion pattern that ingests data in discrete batches, contrasted in the text with streaming ingestion.
batch normalization
A model training technique listed in the text.
batch transform
A SageMaker inference mode that processes data in batches rather than through an always-on endpoint.
Bayesian
A hyperparameter tuning strategy listed in the text.
BERT
A model architecture listed in the text for model selection.
bias detection
The process of identifying bias in datasets; the text specifically mentions pre-training bias metrics.
bias drift
A change over time in the bias characteristics of a model or its outputs.
bias-variance tradeoff
The tradeoff between model bias and variance used to reason about overfitting and underfitting.
BLEU
An NLP evaluation metric listed in the text.
blue/green deployment
A deployment strategy that shifts traffic between two environments to reduce downtime and risk during releases.

C

canary deployment
A deployment strategy that releases a model to a small subset of traffic first to validate behavior before broader rollout.
CI/CD
An abbreviation for continuous integration and continuous delivery, referring to automated build, test, and deployment pipelines.
class weighting
A class-imbalance technique that assigns different weights to classes during model training.
classification
A machine learning problem framing where the model predicts discrete classes.
CloudTrail
An AWS service that records API activity for auditing and compliance.
CloudWatch alarms
Configured alerts in Amazon CloudWatch that trigger when monitored metrics cross defined thresholds.
CloudWatch dashboards
Custom visual displays in Amazon CloudWatch used to track metrics and KPIs.
clustering
A machine learning problem framing where the model groups similar data points.
CodeBuild
An AWS service used with SageMaker Projects to build components in machine learning CI/CD pipelines.
CodePipeline
An AWS service used with SageMaker Projects to orchestrate machine learning CI/CD pipelines.
concept drift
A change in the relationship between input data and the target outcome over time.
confusion matrix
A table used to evaluate classification performance by comparing predicted and actual classes.
continuous delivery
A software delivery practice where validated changes are automatically prepared for release through a pipeline, enabling frequent deployment to production or other environments.
continuous integration
A software delivery practice where code changes are frequently merged and validated through automated builds and tests as part of a pipeline.
cost allocation
The process of assigning cloud costs to specific workloads, teams, or resources.
cross-validation
A model evaluation technique that repeatedly trains and validates on different data splits.
CSV
A comma-separated data format used for ingesting and storing tabular data.

D

data drift
A change in the distribution of input data over time.
data quality
The quality of data as measured by completeness, consistency, accuracy, and deduplication.
data quality drift
A change in the statistical properties or quality of input data over time.
DeepAR
An algorithm listed in the text for model selection.
distributed training
A training approach that spreads model training across multiple resources.
dropout
A regularization technique listed in the text.

E

early stopping
A training control technique listed in the text as a regularization method.
ECS
An AWS container orchestration service listed for custom inference.
EKS
An AWS container orchestration service listed for custom inference.
embeddings
Derived representations used as features during feature engineering.
error rate
The proportion of requests that result in errors.
execution roles
IAM roles used by SageMaker to perform actions on behalf of a job or service.

F

F1
A classification metric listed in the text.
Fargate
An AWS compute option listed for custom inference container orchestration.
feature attribution drift
A change over time in how much individual features contribute to model predictions.
feature engineering
The process of creating or transforming input variables, including aggregations, time-window features, embeddings, and derived features.
file mode
A SageMaker training data input mode listed in the text.
fine-tuning
A training approach that adapts a pre-trained model to a specific task or dataset.
forecasting
A machine learning problem framing for predicting future values or trends.

G

Glacier
An Amazon S3 storage class listed as an option for ML workloads.
Graviton
An AWS instance type listed for inference at scale.
grid
A hyperparameter tuning strategy listed in the text.

H

Hyperband
A hyperparameter tuning strategy listed in the text.
hyperparameter tuning
The process of searching for the best hyperparameter values for a model.

I

IAM
An acronym for AWS Identity and Access Management, used to control access to AWS resources.
imputation
A data preparation technique for filling in missing values.
inference endpoint
A deployed endpoint that serves model predictions for inference requests.
inference pipelines
A SageMaker deployment pattern that chains multiple containers or steps for inference.
Inferentia
An AWS instance type listed for inference at scale.
Infrastructure as Code
A way to define and manage infrastructure programmatically rather than manually.
instance types
The specific compute configurations used for AWS workloads.
Intelligent-Tiering
An Amazon S3 storage class that is listed as an option for ML workloads.
invocations
The number of times an ML endpoint is called.

J

JSON
A structured data format used for ingesting and storing data.

K

k-fold
A cross-validation method that splits data into k folds and rotates the validation fold.
k-means
A clustering algorithm listed in the text.
Kinesis Data Firehose
An AWS data ingestion service for delivering streaming data.
KMS
An acronym for AWS Key Management Service, used for encryption at rest.
KPI
An acronym for key performance indicator, a metric used to measure performance.

L

L1
A regularization method listed in the text.
L2
A regularization method listed in the text.
latency
The time taken for an ML endpoint to return a response.
learning curves
Plots used to assess model performance as training data size or training progress changes.
least-privilege policies
Access policies that grant only the minimum permissions required to perform a task.
linear regression
A regression algorithm listed in the text.
logistic regression
A classification algorithm listed in the text.

M

MAE
A regression metric listed in the text.
MAP
A ranking metric listed in the text.
Mechanical Turk
A data labeling service mentioned as an option for labeling data.
model approval workflows
Approval processes used to gate model promotion before deployment.
model latency
The time taken by the model portion of an inference request to produce a prediction.
model lineage
The traceable history of a model, including its origins, changes, and related artifacts.
model quality drift
A decline or change in model performance over time as monitored in production.
model registry
A repository for storing and managing approved machine learning models and their metadata.
model versioning
The practice of tracking and managing different versions of a machine learning model.
multi-container endpoints
SageMaker endpoints that run multiple containers for inference.
multi-model endpoints
SageMaker endpoints that host multiple models on a single endpoint.

N

NDCG
A ranking metric listed in the text.
neural networks
A class of machine learning models listed in the text.

O

one-hot encoding
A data encoding technique that represents categorical values as binary vectors.
ORC
A data format used for ingesting and storing data.
ordinal encoding
A data encoding technique that maps categories to an ordered numeric representation.
overfitting
A modeling problem where a model fits training data too closely and generalizes poorly.
oversampling
A class-imbalance technique that increases the representation of minority classes.

P

Parquet
A columnar data format used for ingesting and storing data, especially in ML workflows.
pipe mode
A SageMaker training data input mode listed in the text.
PR AUC
A classification metric listed in the text.
precision
A classification metric listed in the text.
prompt engineering
The practice of crafting prompts for foundation models.

R

A regression metric listed in the text.
random
A hyperparameter tuning strategy listed in the text.
RCF
An algorithm listed in the text for model selection.
real-time
A SageMaker endpoint type that serves inference synchronously.
recall
A classification metric listed in the text.
recommendation
A machine learning problem framing for suggesting items or content to users.
RecordIO
A data format used for ingesting and storing data.
regression
A machine learning problem framing where the model predicts continuous values.
regularization
A set of techniques used to reduce overfitting, including L1, L2, dropout, and early stopping.
residual plots
Plots used to analyze prediction errors, especially for regression.
resource tagging
The practice of assigning tags to resources so costs and usage can be tracked and allocated.
right-sizing
The practice of selecting appropriately sized instance types to match workload needs and reduce cost.
RMSE
A regression metric listed in the text.
ROC AUC
A classification metric listed in the text.
role chaining
A pattern in which one IAM role assumes another role to access additional permissions.
ROUGE
An NLP evaluation metric listed in the text.

S

SageMaker Automatic Model Tuning
An AWS SageMaker service for hyperparameter tuning.
SageMaker Autopilot
An AWS AutoML service.
SageMaker built-in algorithms
Predefined SageMaker algorithms used instead of custom models or pre-trained models.
SageMaker Clarify
An AWS SageMaker tool used for pre-training and post-training bias metrics, and for explainability with SHAP.
SageMaker Data Wrangler
An AWS SageMaker tool used for data preparation and transformation.
SageMaker Edge Manager
An AWS SageMaker service for edge deployment.
SageMaker endpoint
A SageMaker deployment target for model inference.
SageMaker Experiments
An AWS SageMaker service for tracking metrics across training runs.
SageMaker Ground Truth
An AWS SageMaker data labeling service.
SageMaker Ground Truth Plus
An AWS SageMaker data labeling service.
SageMaker Inference Recommender
An AWS SageMaker tool used to recommend optimal endpoint sizing for inference workloads.
SageMaker JumpStart
A SageMaker capability for using pre-trained models.
SageMaker Model Cards
Documentation artifacts for SageMaker models that describe model details and usage information.
SageMaker Model Monitor
An AWS SageMaker feature used to monitor model performance and data quality, including drift detection.
SageMaker Pipelines
An AWS SageMaker workflow service for ML pipelines such as preprocessing, training, evaluation, and deployment.
SageMaker Projects
An AWS SageMaker capability used to set up machine learning CI/CD workflows, including integration with CodePipeline and CodeBuild.
SageMaker training jobs
SageMaker jobs used to train models; the text mentions spot training, distributed training, pipe mode, and file mode.
savings plans
An AWS pricing model used to reduce compute costs through committed usage.
Secrets Manager
An AWS service used to store and manage credentials and other secrets.
serverless
A SageMaker endpoint type that runs inference without managing servers.
shadow deployment
A safe rollout pattern where a new model receives mirrored traffic for evaluation without affecting live responses.
SHAP
A model explainability method used by SageMaker Clarify.
SMOTE
A technique used to address class imbalance by generating synthetic minority examples.
spot training
A cost optimization approach that uses spare AWS capacity for training jobs at lower cost.
Step Functions
An AWS orchestration service used for cross-service ML workflows.
stratified sampling
A sampling method that preserves class proportions when splitting data.
streaming ingestion
A data ingestion pattern that continuously ingests data as it is produced, contrasted in the text with batch ingestion.

T

target encoding
A data encoding technique listed with other encoding methods for data transformation.
time-based splits
A data splitting method that partitions data according to time order.
time-series CV
A cross-validation method designed for time-ordered data.
TLS
An acronym for Transport Layer Security, used for encryption in transit.
train/validation/test
A data splitting scheme that divides data into training, validation, and test sets.
Trainium
An AWS instance type listed for inference at scale.
transfer learning
A training approach that reuses knowledge from one model or task for another.

U

underfitting
A modeling problem where a model is too simple to capture the underlying pattern in the data.
undersampling
A class-imbalance technique that reduces the representation of majority classes.

V

VPC endpoints
Private network endpoints used to access SageMaker without traversing the public internet.
VPC isolation
The practice of running training and inference inside a virtual private cloud to isolate resources from broader network access.

X

XGBoost
An algorithm listed in the text for model selection.

About These Definitions

These definitions are loaded from the shared release pack. Use them with the study guide and practice questions to connect vocabulary to exam scenarios.