COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Mean-field theory for batched TD (&lgr;)
Neural Computation
Convergence analysis of temporal-difference learning algorithms with linear function approximation
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Toward a Model of Intelligence as an Economy of Agents
Machine Learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning
Machine Learning
Reinforcement learning for fuzzy agents: application to a pighouse environment control
New learning paradigms in soft computing
TD Models of reward predictive responses in dopamine neurons
Neural Networks - Computational models of neuromodulation
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm
EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Cortico-hippocampal maps and navigation strategies in robots and rodents
ICSAB Proceedings of the seventh international conference on simulation of adaptive behavior on From animals to animats
Isotropic sequence order learning
Neural Computation
The Knowledge Engineering Review
Declarative Optimization-Based Drama Management in Interactive Fiction
IEEE Computer Graphics and Applications
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Artificial Intelligence
Experimental analysis of eligibility traces strategies in temporal difference learning
International Journal of Knowledge Engineering and Soft Data Paradigms
Simulation and reinforcement learning with soccer agents
Multiagent and Grid Systems - Innovations in intelligent agent technology
A spiking neural network model of an actor-critic learning agent
Neural Computation
Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Theoretical advances of intelligent paradigms
Reinforcement distribution in fuzzy Q-learning
Fuzzy Sets and Systems
Efficient reinforcement learning using recursive least-squares methods
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research
Dynamics of temporal difference learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Adaptive state space partitioning for reinforcement learning
Engineering Applications of Artificial Intelligence
Counter example for Q-bucket-brigade under prediction problem
IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Reinforcement learning of competitive skills with soccer agents
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Convergence analysis on approximate reinforcement learning
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Reinforcement learning of competitive and cooperative skills in soccer agents
Applied Soft Computing
A distributed reinforcement learning approach for solving optimization problems
CIT'11 Proceedings of the 5th WSEAS international conference on Communications and information technology
Integrating a partial model into model free reinforcement learning
The Journal of Machine Learning Research
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas
International Journal of Hybrid Intelligent Systems
Hi-index | 0.00 |
The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future.Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.