Asynchronous Stochastic Approximation and Q-Learning

Authors:
John N. Tsitsiklis
Affiliations:
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139. jnt@athena.mit.edu
Venue:
Machine Learning
Year:
1994

Citing 0
Cited 64

Mean-field theory for batched TD (&lgr;)

Neural Computation
Module-Based Reinforcement Learning: Experiments with a Real Robot

Machine Learning - Special issue on learning in autonomous robots
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Module-Based Reinforcement Learning: Experiments with a Real Robot

Autonomous Robots
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
Learning Rates for Q-Learning

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
On the convergence of optimistic policy iteration

The Journal of Machine Learning Research
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Distributed Reinforcement Learning Control for Batch Sequencing and Sizing in Just-In-Time Manufacturing Systems

Applied Intelligence
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
CONVERGENCE OF SIMULATION-BASED POLICY ITERATION

Probability in the Engineering and Informational Sciences
A LEARNING ALGORITHM FOR DISCRETE-TIME STOCHASTIC CONTROL

Probability in the Engineering and Informational Sciences
Dynamic bipedal walking assisted by learning

Robotica
An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games

Autonomous Agents and Multi-Agent Systems
2005 Special Issue: Individualization of pharmacological anemia management using reinforcement learning

Neural Networks - 2005 Special issue: IJCNN 2005
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms

Neural Computation
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
A reinforcement learning approach to active camera foveation

Proceedings of the 4th ACM international workshop on Video surveillance and sensor networks
Quantum robot: structure, algorithms and applications

Robotica
Online calibrated forecasts: Memory efficiency versus universality for learning in games

Machine Learning
Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Mathematics of Operations Research
Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation

Operations Research
Exploring selfish reinforcement learning in repeated games with stochastic rewards

Autonomous Agents and Multi-Agent Systems
The optimizing-simulator: merging simulation and optimization using approximate dynamic programming

Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come
Brief paper: New algorithms of the Q-learning type

Automatica (Journal of IFAC)
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Reinforcement learning for problems with symmetrical restricted states

Robotics and Autonomous Systems
Study of Cooperation Strategy of Robot Based on Parallel Q-Learning Algorithm

ICIRA '08 Proceedings of the First International Conference on Intelligent Robotics and Applications: Part I
An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

Mathematics of Operations Research
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Risk-sensitive reinforcement learning applied to control under constraints

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Predicting and preventing coordination problems in cooperative Q-learning systems

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Reinforcement learning versus model predictive control: a comparison on a power system problem

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Q-learning approach to derive optimal consumption and investment strategies

IEEE Transactions on Neural Networks
Adaptive state space partitioning for reinforcement learning

Engineering Applications of Artificial Intelligence
Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Cooperative multi-robot reinforcement learning: a framework in hybrid state space

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Counter example for Q-bucket-brigade under prediction problem

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Reinforcement learning of competitive skills with soccer agents

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Learning multi-agent state space representations

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
A stochastic approximation method with max-norm projections and its applications to the Q-learning algorithm

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Reinforcement learning of competitive and cooperative skills in soccer agents

Applied Soft Computing
Generalized learning automata for multi-agent reinforcement learning

AI Communications - European Workshop on Multi-Agent Systems (EUMAS) 2009
Solving multi-stage games with hierarchical learning automata that bootstrap

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Multiagent Q-learning for aloha-like spectrum access in cognitive radio systems

EURASIP Journal on Wireless Communications and Networking
An information-spectrum approach to analysis of return maximization in reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Ambulance redeployment: an approximate dynamic programming approach

Winter Simulation Conference
Solving delayed coordination problems in MAS

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
Brief paper: Iterative learning control for large scale nonlinear systems with observation noise

Automatica (Journal of IFAC)
Learning multi-modal control programs

HSCC'05 Proceedings of the 8th international conference on Hybrid Systems: computation and control
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Trace equivalence characterization through reinforcement learning

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Solving sparse delayed coordination problems in multi-agent reinforcement learning

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Actor-critic algorithms for hierarchical Markov decision processes

Automatica (Journal of IFAC)
Value-function reinforcement learning in Markov games

Cognitive Systems Research
Approximate stochastic annealing for online control of infinite horizon Markov decision processes

Automatica (Journal of IFAC)
On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Mathematics of Operations Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.