Question 18
IWhich statement best captures the core idea of reinforcement learning?
Correct answer: B
Explanation
Reinforcement learning is defined by an agent that “interacts with an environment” and learns from rewards tied to its actions. The goal is to learn a policy that maximizes “cumulative reward over time,” which captures the trial-and-error, long-term optimization at the heart of the method.
Why each option is right or wrong
A. The model passively fits a function from labeled inputs to outputs.
This describes supervised learning, where labeled input-output pairs train a predictive function.
B. An agent interacts with an environment, taking actions and learning a policy that maximizes cumulative reward over time.
Reinforcement learning is formally framed as a sequential decision-making problem in which an agent observes states, selects actions, receives scalar rewards, and updates a policy to maximize expected return over time; this is the standard Markov decision process setup used in Sutton and Barto’s formulation. The critical feature is the long-horizon objective: the policy is optimized for cumulative reward, not immediate correctness, which is why the option describing interaction with an environment and learning a reward-maximizing policy matches the core definition.
C. The model clusters data points to discover underlying groups.
This describes unsupervised clustering, which groups similar data without action-reward feedback.
D. The model learns from a static dataset without any feedback.
This omits interaction and reward signals, which are central to reinforcement learning.