Monte Carlo matrix inversion policy evaluation

Authors:
Fletcher Lu;Dale Schuurmans
Affiliations:
School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
Venue:
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Year:
2002

Citing 7
Cited 2

Direct methods for sparse matrices

Direct methods for sparse matrices
The Convergence of TD(λ) for General λ

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Investigating the Maximum Likelihood Alternative to TD(lambda)

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning

Error bounds in reinforcement learning policy evaluation

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Adaptive fraud detection using benford's law

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI possesses accuracy similar to a maximum likelihood model-based policy evaluation approach but avoids ML's slow execution time. In fact, we show that MCMI executes at a similar runtime to temporal differencing (TD). We then illustrate a least-squares generalization technique for scaling up MCMI to large state spaces. We compare this leastsquares Monte Carlo matrix inversion (LS-MCMI) technique to the least-squares temporal differencing (LSTD) approach introduced by Bradtke and Barto (1996) demonstrating that both LS-MCMI and LSTD have similar runtime.