Combining importance sampling and temporal difference control variates to simulate Markov Chains

Authors:
R. S. Randhawa;S. Juneja
Affiliations:
Stanford University, Stanford, CA;Tata Institute of Fundamental Research, Mumbai, India
Venue:
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Year:
2004

Citing 15
Cited 5

A guide to simulation (2nd ed.)

A guide to simulation (2nd ed.)
Importance sampling for stochastic simulations

Management Science
Variance reduction through smoothing and control variates for Markov chain simulations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Analysis of an importance sampling estimator for tandem queues

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fast simulation of rare events in queueing and reliability models

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Importance Sampling and the Cyclic Approach

Operations Research
Variance reduction techniques for simulating Markov chains

WSC '77 Proceedings of the 9th conference on Winter simulation - Volume 1
Fast Simulation of Markov Chains with Small Transition Probabilities

Management Science

Efficient simulation of buffer overflow probabilities in jackson networks with feedback

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Analysis of state-independent importance-sampling measures for the two-node tandem queue

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Efficient importance sampling heuristics for the simulation of population overflow in Jackson networks

WSC '05 Proceedings of the 37th conference on Winter simulation
Efficient heuristics for the simulation of population overflow in series and parallel queues

valuetools '06 Proceedings of the 1st international conference on Performance evaluation methodolgies and tools
Efficient importance sampling heuristics for the simulation of population overflow in Jackson networks

ACM Transactions on Modeling and Computer Simulation (TOMACS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well known that in estimating performance measures associated with a stochastic system a good importance sampling distribution (IS) can give orders of magnitude of variance reduction while a bad one may lead to large, even infinite, variance. In this paper we study how this sensitivity of the estimator variance to the importance sampling change of measure may be "dampened" by combining importance sampling with stochastic approximation based temporal difference (TD) method. We consider a finite state space discrete time Markov chain (DTMC) with one-step transition rewards and an absorbing set of states and focus on estimating the cumulative expected reward to absorption starting from any state. In this setting we develop sufficient conditions under which the estimate resulting from the combined approach has a mean square error that asymptotically equals zero even when the estimate formed by using only importance sampling change of measure has infinite variance. In particular, we consider the problem of estimating the small buffer overflow probability in a queuing network, where the change of measure suggested in literature is shown to have infinite variance under certain parameters and where the appropriate combination of IS and TD method can be empirically seen to have a much faster convergence rate compared to naive simulation.