Question 35
Section 1A retailer wants a single model that can answer questions about a product image, a written review, and an audio voicemail in the same conversation. Which capability of Google's Gemini family makes this possible?
Correct answer: A
Explanation
Gemini’s multimodality lets one model handle different input types in the same conversation. It supports “native input of text, images, audio, and video in one model,” so it can answer about a product image, a written review, and an audio voicemail without switching systems.
Why each option is right or wrong
A. Multimodality - native input of text, images, audio, and video in one model
Google’s Gemini models are designed with native multimodal input, meaning a single model can process text, images, audio, and video together rather than requiring separate systems. That directly fits the retailer’s use case because the conversation combines a product image, a written review, and an audio voicemail in one interaction, which is exactly the kind of cross-modal input Gemini is built to handle.
B. Federated personalization across user devices
Federated personalization focuses on adapting models across devices, not jointly understanding image, text, and audio inputs.
C. Quantization-aware training for on-device inference
Quantization-aware training is a model optimization technique for efficient deployment, not a cross-media understanding capability.
D. Symbolic reasoning over a knowledge graph
Symbolic reasoning over knowledge graphs concerns structured logic, not native processing of mixed media in one model.