A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Authors:
Csaba Szepesvári;Michael L. Littman
Affiliations:
Mindmaker, Ltd., Budapest 1121, Konkoly Thege M. U. 29-33, Hungary;Department of Computer Science, Duke University, Durham, NC 27708-0129, U.S.A.
Venue:
Neural Computation
Year:
1999

Citing 22
Cited 7

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Real-time heuristic search

Artificial Intelligence
Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Neurocomputing

Neurocomputing
Technical Note: \cal Q-Learning

Machine Learning
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
When the best move isn't optimal: Q-learning with exploration

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
The asymptotic convergence-rate of Q-learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Aggregation Methods for Large Markov Chains

Proceedings of the International Workshop on Computer Performance and Reliability
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
A Generalized Reinforcement-Learning Model: Convergence and Applications

A Generalized Reinforcement-Learning Model: Convergence and Applications
Learning and Sequential Decision Making

Learning and Sequential Decision Making
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Coordinating Multiple Agents via Reinforcement Learning

Autonomous Agents and Multi-Agent Systems
A layered approach to learning coordination knowledge in multiagent environments

Applied Intelligence
Value Function Based Reinforcement Learning in Changing Markovian Environments

The Journal of Machine Learning Research
Nash Q-learning multi-agent flow control for high-speed networks

ACC'09 Proceedings of the 2009 conference on American Control Conference
Coordinated learning in multiagent MDPs with infinite state-space

Autonomous Agents and Multi-Agent Systems
Value-function reinforcement learning in Markov games

Cognitive Systems Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.