Monte Carlo matrix inversion policy evaluation

  • Authors:
  • Fletcher Lu;Dale Schuurmans

  • Affiliations:
  • School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI possesses accuracy similar to a maximum likelihood model-based policy evaluation approach but avoids ML's slow execution time. In fact, we show that MCMI executes at a similar runtime to temporal differencing (TD). We then illustrate a least-squares generalization technique for scaling up MCMI to large state spaces. We compare this leastsquares Monte Carlo matrix inversion (LS-MCMI) technique to the least-squares temporal differencing (LSTD) approach introduced by Bradtke and Barto (1996) demonstrating that both LS-MCMI and LSTD have similar runtime.