Temporal Difference Learning

Algorithm

Temporal Difference (TD) learning is a core concept in reinforcement learning that allows an agent to learn from experience without a model of the environment’s dynamics. It updates value functions based on bootstrapping from estimated values of future states rather than waiting for final outcomes. This method learns by comparing successive predictions, effectively reducing the variance of updates. It is a powerful approach for estimating value functions in sequential decision-making problems. TD learning combines Monte Carlo ideas with dynamic programming.