On the convergence of stochastic iterative dynamic programming algorithms

Authors:
Tommi Jaakkola;Michael I. Jordan;Satinder P. Singh
Affiliations:
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA;Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA;Department of Computer Science, University of Massachusetts, Amherst, MA 01003 USA
Venue:
Neural Computation
Year:
1994

Citing 8
Cited 46

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Sequential decision problems and neural networks

Advances in neural information processing systems 2
Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Parallel and Distributed Computation: Numerical Methods

Parallel and Distributed Computation: Numerical Methods
Learning to Predict by the Methods of Temporal Differences

Machine Learning

Module Based Reinforcement Learning: An Application to a Real Robot

EWLR-6 Proceedings of the 6th European Workshop on Learning Robots
On the Use of Option Policies for Autonomous Robot Navigation

IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
An Analysis of the Pheromone Q-Learning Algorithm

IBERAMIA 2002 Proceedings of the 8th Ibero-American Conference on AI: Advances in Artificial Intelligence
Sequential Strategy for Learning Multi-stage Multi-agent Collaborative Games

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Learning Multi-agent Strategies in Multi-stage Collaborative Games

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Learning Rates for Q-Learning

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
A Multi-agent Q-learning Framework for Optimizing Stock Trading Systems

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
A reinforcement learning approach to dynamic resource allocation

Engineering Applications of Artificial Intelligence
Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Mathematics of Operations Research
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Learning how to combine sensory-motor functions into a robust behavior

Artificial Intelligence
Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin

Neural Information Processing
Route Optimization Using Q-Learning for On-Demand Bus Systems

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Expert Systems with Applications: An International Journal
Reinforcement distribution in fuzzy Q-learning

Fuzzy Sets and Systems
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

Mathematics of Operations Research
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
SarsaLandmark: an algorithm for learning in POMDPs with landmarks

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research
Route optimisation using evolutionary approaches for on-demand pickup problem

International Journal of Advanced Intelligence Paradigms
A reinforcement learning approach to dynamic resource allocation

A reinforcement learning approach to dynamic resource allocation
Adaptive state space partitioning for reinforcement learning

Engineering Applications of Artificial Intelligence
Cooperative multi-robot reinforcement learning: a framework in hybrid state space

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Review article: Synergizing reinforcement learning and game theory-A new direction for control

Applied Soft Computing
Counter example for Q-bucket-brigade under prediction problem

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Joint path and wavelength selection using Q-learning in optical burst switching networks

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Learning hybridization strategies in evolutionary algorithms

Intelligent Data Analysis
Reinforcement learning with time

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
Learning multi-modal control programs

HSCC'05 Proceedings of the 8th international conference on Hybrid Systems: computation and control
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Adaptive stock trading with dynamic asset allocation using reinforcement learning

Information Sciences: an International Journal
Enhanced temporal difference learning using compiled eligibility traces

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Value-function reinforcement learning in Markov games

Cognitive Systems Research
Event-learning and robust policy heuristics

Cognitive Systems Research
Comparative evaluation of MAL algorithms in a diverse set of ad hoc team problems

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Reinforcement-Learning-Based Double Auction Design for Dynamic Spectrum Access in Cognitive Radio Networks

Wireless Personal Communications: An International Journal
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Dynamic policy programming

The Journal of Machine Learning Research
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q-learning belong.