Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Sequential decision problems and neural networks
Advances in neural information processing systems 2
Technical Note: \cal Q-Learning
Machine Learning
The Convergence of TD(λ) for General λ
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Parallel and Distributed Computation: Numerical Methods
Parallel and Distributed Computation: Numerical Methods
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Module Based Reinforcement Learning: An Application to a Real Robot
EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
On the Use of Option Policies for Autonomous Robot Navigation
IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
An Analysis of the Pheromone Q-Learning Algorithm
IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
Sequential Strategy for Learning Multi-stage Multi-agent Collaborative Games
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Learning Multi-agent Strategies in Multi-stage Collaborative Games
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm
EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
A reinforcement learning approach to dynamic resource allocation
Engineering Applications of Artificial Intelligence
Performance Loss Bounds for Approximate Value Iteration with State Aggregation
Mathematics of Operations Research
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Artificial Intelligence
Learning how to combine sensory-motor functions into a robust behavior
Artificial Intelligence
Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin
Neural Information Processing
Route Optimization Using Q-Learning for On-Demand Bus Systems
KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach
Expert Systems with Applications: An International Journal
Reinforcement distribution in fuzzy Q-learning
Fuzzy Sets and Systems
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem
Mathematics of Operations Research
Reinforcement Learning: A Tutorial Survey and Recent Advances
INFORMS Journal on Computing
SarsaLandmark: an algorithm for learning in POMDPs with landmarks
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Hierarchical reinforcement learning with the MAXQ value function decomposition
Journal of Artificial Intelligence Research
Efficient reinforcement learning using recursive least-squares methods
Journal of Artificial Intelligence Research
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Journal of Artificial Intelligence Research
Route optimisation using evolutionary approaches for on-demand pickup problem
International Journal of Advanced Intelligence Paradigms
A reinforcement learning approach to dynamic resource allocation
A reinforcement learning approach to dynamic resource allocation
Adaptive state space partitioning for reinforcement learning
Engineering Applications of Artificial Intelligence
Cooperative multi-robot reinforcement learning: a framework in hybrid state space
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Counter example for Q-bucket-brigade under prediction problem
IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Joint path and wavelength selection using Q-learning in optical burst switching networks
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Learning hybridization strategies in evolutionary algorithms
Intelligent Data Analysis
Reinforcement learning with time
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Exploiting Best-Match Equations for Efficient Reinforcement Learning
The Journal of Machine Learning Research
Learning multi-modal control programs
HSCC'05 Proceedings of the 8th international conference on Hybrid Systems: computation and control
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Mathematics of Operations Research
Adaptive stock trading with dynamic asset allocation using reinforcement learning
Information Sciences: an International Journal
Enhanced temporal difference learning using compiled eligibility traces
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Value-function reinforcement learning in Markov games
Cognitive Systems Research
Event-learning and robust policy heuristics
Cognitive Systems Research
Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Wireless Personal Communications: An International Journal
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
The Journal of Machine Learning Research
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Hi-index | 0.00 |
Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q-learning belong.