NCA-GENL Exam Glossary - 35 Terms

Search the terminology pack for NVIDIA Certified Associate: Generative AI LLMs. Use these definitions with the study guide and practice questions.

Download App Study Guide Free Practice Exam

A

Attention mask: A masking mechanism that prevents attention to padding tokens or disallowed positions such as future tokens in causal models.

B

BERT: A bidirectional transformer-based language model that uses token, position, and segment information for NLP tasks.
BLEU: A machine translation evaluation metric based on n-gram overlap between generated text and reference text.

C

Catastrophic forgetting: The loss of previously learned knowledge when a model is fine-tuned on new task-specific data.

D

Distributed training: Training a model across multiple devices or nodes to accelerate computation and scale to larger workloads.
Dropout: A regularization technique that randomly sets some neuron outputs to zero during training to reduce overfitting.

E

Exploratory Data Analysis (EDA): Initial analysis of a dataset to uncover patterns, anomalies, quality issues, class imbalance, and feature relationships before model training or fine-tuning.

F

Feed-forward network: The position-wise fully connected sublayer in each transformer block that applies nonlinear transformations to token representations.
Fine-tuning: Adapting a pre-trained language model to a specific downstream task or application using task-specific data.

G

Gradient accumulation: A technique that sums gradients across multiple mini-batches before performing a weight update to simulate larger batch sizes.

I

Inference optimization: Techniques and tools used to improve model serving efficiency, latency, throughput, and resource usage during prediction.
Internal covariate shift: Changes in the distribution of layer inputs during training that can make optimization less stable.

K

Knowledge distillation: A model compression technique where a smaller student model learns to mimic a larger teacher model.

L

Latency: The time delay between submitting a request to a model and receiving the response.
Layer normalization: A normalization method that stabilizes training by normalizing activations within a layer across features.
Learning rate scheduling: The process of adjusting the learning rate over time to improve optimization and training stability.

M

Mini-batch: A small subset of training data processed in one forward and backward pass during optimization.
Model portability: The ability to move and run a model across different tools, frameworks, or deployment environments.
Model quantization: A compression method that reduces numerical precision of weights and activations to lower memory use and speed computation.

N

n-gram precision: A measure used in BLEU that evaluates how many contiguous token sequences in generated text match the reference.
Named Entity Recognition (NER): An NLP task that identifies and classifies entities such as people, organizations, and locations in text.
Nucleus sampling (top-p): A text generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p.
NVIDIA NeMo: NVIDIA framework for building, training, and deploying conversational AI, speech, and NLP models, including pre-trained models.

O

ONNX: Open Neural Network Exchange format that enables trained models to be transferred across frameworks and runtimes.

P

Positional encoding: A mechanism that injects token order information into transformer inputs so sequence position can be modeled.
Pre-trained models: Models trained in advance on large datasets and later reused or adapted for downstream tasks.

R

Regularization: Methods used to improve generalization and reduce overfitting in machine learning models.

S

Segment embeddings: Embeddings in BERT that indicate which tokens belong to sentence A versus sentence B in paired-input tasks.
Self-attention: A mechanism that allows each token in a sequence to attend to other tokens to build contextualized representations.
Student model: A smaller model trained to replicate the performance or behavior of a larger teacher model.

T

Teacher model: A larger or more capable model that provides target behavior or soft labels for training a smaller model in distillation.
TensorRT: NVIDIA SDK for optimizing neural networks for deployment, especially for low-latency, high-throughput inference.
Text generation: The process of producing natural language output from a language model based on input context and decoding strategy.
Transformer: A neural network architecture based on attention mechanisms, widely used for language modeling and sequence tasks.

W

Warm-up: A learning rate scheduling strategy that gradually increases the learning rate at the beginning of training.

About These Definitions

These definitions are loaded from the shared release pack. Use them with the study guide and practice questions to connect vocabulary to exam scenarios.

Download App Read the full study guide Take the free practice exam