Convergence of synchronous reinforcement learning with linear function approximation

Authors:
Artur Merke;Ralf Schoknecht
Affiliations:
University of Dortmund, Dortmund, Germany;University of Karlsruhe, Karlsruhe, Germany
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 7
Cited 1

Invariant subspaces of matrices with applications

Invariant subspaces of matrices with applications
An introduction to difference equations

An introduction to difference equations
Iterative methods for solving linear systems

Iterative methods for solving linear systems
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
A Necessary Condition of Convergence for Reinforcement Learning with Function Approximation

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence

Counter example for Q-bucket-brigade under prediction problem

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synchronous reinforcement learning (RL) algorithms with linear function approximation are representable as inhomogeneous matrix iterations of a special form (Schoknecht & Merke, 2003). In this paper we state conditions of convergence for general inhomogeneous matrix iterations and prove that they are both necessary and sufficient. This result extends the work presented in (Schoknecht & Merke, 2003), where only a sufficient condition of convergence was proved. As the condition of convergence is necessary and sufficient, the new result is suitable to prove convergence and divergence of RL algorithms with function approximation. We use the theorem to deduce a new concise proof of convergence for the synchronous residual gradient algorithm (Baird, 1995). Moreover, we derive a counterexample for which the uniform RL algorithm (Merke & Schoknecht, 2002) diverges. This yields a negative answer to the open question if the uniform RL algorithm converges for arbitrary multiple transitions.