Temporal Difference Methods

Technique

Temporal Difference (TD) methods are a class of model-free reinforcement learning techniques that learn to predict future rewards by bootstrapping from current estimates of future values. These methods update value functions based on the difference between successive predictions, without requiring a complete model of the environment. They combine elements of Monte Carlo methods, which learn from complete episodes, and dynamic programming, which relies on a perfect model. TD methods are fundamental for learning in sequential decision-making tasks. They provide efficient online learning.