Study Guide
Databricks Certified Generative AI Engineer Associate Study Guide
Use the saved domain outline to connect design applications, data preparation, application development, assembling and deploying applications to scenario-based questions and explanations.
How the Exam Is Structured
Databricks Certified Generative AI Engineer Associate (GenAI Associate) validates design applications, data preparation, application development, assembling and deploying applications. The ExamPal practice bank includes 322 premium questions and 40 free questions mapped across the official blueprint.
| Domain | Weight | Focus |
|---|---|---|
| Domain 1: Design Applications | 14% | Design a prompt that elicits a specifically formatted response; Design prompts for specific response formats |
| Domain 2: Data Preparation | 14% | Apply a chunking strategy for a given document structure and model constraints; Filter extraneous content in source documents that degrades quality of a RAG application |
| Domain 3: Application Development | 30% | Select Langchain/similar tools for use in a Generative AI application; Qualitatively assess responses to identify common issues such as quality and safety |
| Domain 4: Assembling and Deploying Applications | 22% | Code a chain using a pyfunc model with pre- and post-processing; Control access to resources from model serving endpoints |
| Domain 5: Governance | 8% | Use masking techniques as guard rails to meet a performance objective; Use masking techniques as guard rails |
| Domain 6: Evaluation and Monitoring | 12% | Select an LLM choice (size and architecture) based on a set of quantitative evaluation metrics; Select the best LLM based on the attributes of the application to be developed |
14% of exam
Domain 1: Design Applications
This section covers how to design LLM-enabled applications in Databricks by translating business requirements into prompts, model tasks, chain components, and AI pipeline inputs/outputs. It also includes selecting and ordering tools for multi-stage reasoning and deciding when to use Agent Bricks capabilities.
14% of exam
Domain 2: Data Preparation
This section covers preparing source data for retrieval-augmented generation (RAG) workflows, including chunking, filtering noisy content, selecting extraction tools, and loading chunked text into Delta Lake tables in Unity Catalog. It also addresses source document selection, retrieval evaluation, advanced chunking strategies, and the role of re-ranking in retrieval systems.
30% of exam
Domain 3: Application Development
Covers practical skills for building generative AI applications, including tool selection, prompt construction, retrieval design, model selection, guardrails, and evaluation/monitoring. It also includes agentic and multi-agent system development using MLflow, Agent Framework, Genie Spaces, and conversational APIs.
22% of exam
Domain 4: Assembling and Deploying Applications
Covers how to assemble AI applications using chains, retrieval, vector search, and model serving patterns. It also includes deployment, CI/CD, prompt lifecycle management, MCP server integration, and user-facing interfaces for agent scenarios.
8% of exam
Domain 5: Governance
This section covers governance practices for GenAI applications, with emphasis on guardrails, masking techniques, and mitigation strategies that support performance objectives and reduce risk. It also addresses protecting applications from malicious user inputs and ensuring data sources comply with legal and licensing requirements.
12% of exam
Domain 6: Evaluation and Monitoring
Covers how to evaluate LLMs and agents, choose metrics, and monitor deployed applications in Databricks. It also includes cost control, inference logging, AI Gateway, custom scorers, and incorporating SME feedback to improve performance.
Key Terms to Know
These terms are loaded from the shared terminology pack and appear across the question explanations.
- AI Gateway
- A Databricks capability that includes Inference Tables, Usage Tables, and rate limiting for tracking deployed LLMs or agents.
- Agent Bricks
- A Databricks feature set for solving problems with specialized agent patterns, including Knowledge Assistant, Multiagent Supervisor, and Information Extraction.
- Agent Framework
- A Databricks framework used to deploy and track LLMs or agents, including with AI Gateway.
- CI/CD
- Continuous integration and continuous delivery/deployment practices used here for updating indexes, promoting prompts, and testing components.
- Databricks App
- An application built on Databricks that can provide an interactive user-facing interface. In the question, it is used for customer support agents asking questions and receiving grounded answers.
- Databricks Certified Generative AI Engineer Associate
- A Databricks certification exam that assesses the ability to design and implement LLM-enabled solutions using Databricks, including RAG applications, LLM chains, model selection, governance, deployment, and monitoring.
- Databricks Secrets
- A Databricks feature for securely storing sensitive values such as API keys. In the question, it is used to store the external MCP server API key.
- Delta Lake
- A Databricks storage layer used here as the destination for writing chunked text tables in Unity Catalog.
- Foundation Model APIs
- Databricks APIs used to serve LLM applications leveraging foundation models.
- Hugging Face Transformers
- A related online tool/service used for working with transformer-based models in generative AI applications.
- Inference Tables
- Tables used to track inference activity for deployed models or agents.
- Information Extraction
- An Agent Bricks option used to extract structured information from content.
- Knowledge Assistant
- An Agent Bricks option used to solve problems by providing knowledge-based assistance.
- LLM
- A large language model used in generative AI applications; the exam expects knowledge of current LLMs, their capabilities, and how to select them for tasks.
- LLM chains
- Multi-step application flows that combine an LLM with other components such as tools, retrievers, or prompts to produce an output.
- LLM-as-a-judge
- An evaluation approach where a language model scores or judges model outputs instead of, or in addition to, human raters. In the question, it is proposed as a way to rescore responses.
- LangChain
- A tool used in generative AI applications for building chains and related workflows.
- MCP
- Model Context Protocol. In the question, MCP servers are integrated to give an agent access to external and managed data sources.
Official Materials and Guidance
This page is built from Databricks official materials and ExamPal shared release pack, the shared syllabus, topic tree, terminology pack, free pack, and premium pack.
- -Databricks Genai Associate Exam Guide
- -Guidance: Official Databricks exam guide PDF with sample questions
- -Domain outline: Design Applications 14%; Data Preparation 14%; Application Development 30%; Assembling/Deploying Applications 22%; Governance 8%; Evaluation/Monitoring 12%.