AIGP Practice Q18

A. The model passively fits a function from labeled inputs to outputs.

This describes supervised learning, where labeled input-output pairs train a predictive function.

B. An agent interacts with an environment, taking actions and learning a policy that maximizes cumulative reward over time.

Reinforcement learning is formally framed as a sequential decision-making problem in which an agent observes states, selects actions, receives scalar rewards, and updates a policy to maximize expected return over time; this is the standard Markov decision process setup used in Sutton and Barto’s formulation. The critical feature is the long-horizon objective: the policy is optimized for cumulative reward, not immediate correctness, which is why the option describing interaction with an environment and learning a reward-maximizing policy matches the core definition.

C. The model clusters data points to discover underlying groups.

This describes unsupervised clustering, which groups similar data without action-reward feedback.

D. The model learns from a static dataset without any feedback.

This omits interaction and reward signals, which are central to reinforcement learning.

Question 18

Explanation

Why each option is right or wrong