Geometric variance reduction in Markov chains: application to value function and gradient estimation

Authors:
Rémi Munos
Affiliations:
Centre de Mathématiques Appliquées, Ecole Polytechnique, Palaiseau Cedex, France
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Year:
2005

Citing 12
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Sequential Monte Carlo techniques for the solution of linear systems

Journal of Scientific Computing
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Sensitivity analysis via likelihood ratios

WSC '86 Proceedings of the 18th conference on Winter simulation
Likelilood ratio gradient estimation: an overview

WSC '87 Proceedings of the 19th conference on Winter simulation
Actor-Critic--Type Learning Algorithms for Markov Decision Processes

SIAM Journal on Control and Optimization
Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
An iterative computation of approximations on Korobov-like spaces

Journal of Computational and Applied Mathematics
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
Sequential Control Variates for Functionals of Markov Processes

SIAM Journal on Numerical Analysis
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using successive approximations of the function of interest V. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of samples. Here, we obtain a geometric variance reduction O(ρN) (with ρ V - AV, where A is an approximation operator linear in the values. Thus, if V belongs to the right approximation space (i.e. AV = V), the variance decreases geometrically to zero. An immediate application is value function estimation in Markov chains, which may be used for policy evaluation in policy iteration for Markov Decision Processes. Another important domain, for which variance reduction is highly needed, is gradient estimation, that is computing the sensitivity ∂α, V of the performance measure V with respect to some parameter α of the transition probabilities. For example, in parametric optimization of the policy, an estimate of the policy gradient is required to perform a gradient optimization method. We show that, using two approximations, the value function and the gradient, a geometric variance reduction is also achieved, up to a threshold that depends on the approximation errors of both of those representations.