Which reinforcement learning algorithm utilizes a value function to estimate the expected future reward?
Q-learning
Actor-critic
Overlook minor misbehaviors
Impose harsh punishments for any infraction

Robotics and Autonomous Systems Übungen werden geladen ...