Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Authors:
Satinder Singh;Tommi Jaakkola;Michael L. Littman;Csaba Szepesvári
Affiliations:
AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932, USA. baveja@research.att.com;Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. tommi@ai.mit.edu;Department of Computer Science, Duke University, Durham, NC 27708-0129, USA. mlittman@cs.duke.edu;Mindmaker Ltd., Konkoly Thege M. u. 29-33, Budapest 1121, Hungary. szepes@mindmaker.hu
Venue:
Machine Learning
Year:
2000

Citing 20
Cited 61

Stochastic systems: estimation, identification and adaptive control

Stochastic systems: estimation, identification and adaptive control
Probability

Probability
Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
TD(λ) Converges with Probability 1

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
An Upper Bound on the Loss from Approximate Optimal-Value Functions

Machine Learning
When the best move isn't optimal: Q-learning with exploration

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Temporal difference learning and TD-Gammon

Communications of the ACM
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Exploration bonuses and dual control

Machine Learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Dynamic Programming

Dynamic Programming
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Algorithms for Sequential Decision Making

Algorithms for Sequential Decision Making
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

The dynamics of reinforcement learning in cooperative multiagent systems

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Using background knowledge to speed reinforcement learning in physical agents

Proceedings of the fifth international conference on Autonomous agents
Multiagent learning using a variable learning rate

Artificial Intelligence
Module-Based Reinforcement Learning: Experiments with a Real Robot

Autonomous Robots
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Metalearning and neuromodulation

Neural Networks - Computational models of neuromodulation
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies

Autonomous Agents and Multi-Agent Systems
ZCS redux

Evolutionary Computation
An Overview of MAXQ Hierarchical Reinforcement Learning

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Reinforcement learning of coordination in cooperative multi-agent systems

Eighteenth national conference on Artificial intelligence
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Learning when and how to coordinate

Web Intelligence and Agent Systems
Reinforcement Learning with Factored States and Actions

The Journal of Machine Learning Research
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
Guiding exploration by pre-existing knowledge without modifying reward

Neural Networks
Parallel reinforcement learning with linear function approximation

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
The many faces of optimism: a unifying approach

Proceedings of the 25th international conference on Machine learning
Emerging coordination in infinite team Markov games

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Efficient multi-agent reinforcement learning through automated supervision

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 3
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

Mathematics of Operations Research
Integrating organizational control into multi-agent learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Existence of multiagent equilibria with limited agents

Journal of Artificial Intelligence Research
Adaptive stochastic resource control: a machine learning approach

Journal of Artificial Intelligence Research
Multiple-goal reinforcement learning with modular Sarsa(O)

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A multi-agent learning approach to online distributed resource allocation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Anytime Self-play Learning to Satisfy Functional Optimality Criteria

ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
On the asymptotic equivalence between differential Hebbian and temporal difference learning

Neural Computation
Fuzzy decision tree function approximation in reinforcement learning

International Journal of Artificial Intelligence and Soft Computing
Counter example for Q-bucket-brigade under prediction problem

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Posterior weighted reinforcement learning with state uncertainty

Neural Computation
An agent reinforcement learning model based on neural networks

LSMS'07 Proceedings of the Life system modeling and simulation 2007 international conference on Bio-Inspired computational intelligence and applications
Convergence of independent adaptive learners

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence
Reinforcement learning approaches to coordination in cooperative multi-agent systems

Adaptive agents and multi-agent systems
On-line learning and optimization for wireless video transmission

IEEE Transactions on Signal Processing
Automatic induction of bellman-error features for probabilistic planning

Journal of Artificial Intelligence Research
Parallel reinforcement learning with linear function approximation

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
An information-spectrum approach to analysis of return maximization in reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Benchmarking hybrid algorithms for distributed constraint optimisation games

Autonomous Agents and Multi-Agent Systems
Exploiting Best-Match Equations for Efficient Reinforcement Learning

The Journal of Machine Learning Research
Heliza: talking dirty to the attackers

Journal in Computer Virology
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
Review: a unifying framework for iterative approximate best-response algorithms for distributed constraint optimization problems1

The Knowledge Engineering Review
Multiagent reinforcement learning model for the emergence of common property and transhumance in sub-saharan africa

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Book reviews: Self-learning control of finite Markov chains

Automatica (Journal of IFAC)
Value-function reinforcement learning in Markov games

Cognitive Systems Research
Event-learning and robust policy heuristics

Cognitive Systems Research
Optimistic Bayesian sampling in contextual-bandit problems

The Journal of Machine Learning Research
Approximate stochastic annealing for online control of infinite horizon Markov decision processes

Automatica (Journal of IFAC)
Multi-agent task division learning in hide-and-seek games

AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
A sampled fictitious play based learning algorithm for infinite horizon Markov decision processes

Proceedings of the Winter Simulation Conference
Dynamic policy programming

The Journal of Machine Learning Research
Reinforcement learning for cooperative sensing gain in cognitive radio ad hoc networks

Wireless Networks
A novel reinforcement learning architecture for continuous state and action spaces

Advances in Artificial Intelligence
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

An important application of reinforcement learning(RL) is to finite-state control problems and one of the mostdifficult problems in learning for control is balancing theexploration/exploitation tradeoff. Existing theoretical results forRL give very little guidance on reasonable ways to performexploration. In this paper, we examine the convergence ofsingle-step on-policy RL algorithms for control. On-policyalgorithms cannot separate exploration from learning and thereforemust confront the exploration problem directly. We prove convergenceresults for several related on-policy algorithms with both decayingexploration and persistent exploration. We also provide examples ofexploration strategies that can be followed during learning thatresult in convergence to both optimal values and optimal policies.