Question 17
UnclassifiedHow do reinforcement learning agents learn?
Correct answer: B
Explanation
Reinforcement learning agents learn by taking actions in an environment, observing the results, and using reward feedback to improve future decisions. The goal is to maximize "cumulative reward signals," which means choosing actions that lead to the highest total reward over time.
Why each option is right or wrong
A. By memorizing labeled examples from a training set
B. By interacting with an environment and maximizing cumulative reward signals
Reinforcement learning is defined by an agent–environment loop: the agent selects actions, receives state transitions and reward feedback, and updates its policy to maximize expected cumulative reward over time. The operative objective is not immediate accuracy or labeled supervision, but the long-run return (the sum of rewards across steps), which is the standard criterion used in RL formulations.
C. By computing gradients of cross-entropy loss on labeled classes
D. By clustering states based on similarity without using rewards