Likelihood ratio gradient estimation for stochastic systems

Authors:
Peter W. Glynn
Affiliations:
Stanford Univ., Stanford, CA
Venue:
Communications of the ACM - Special issue on simulation
Year:
1990

Citing 8
Cited 39

The score function approach for sensitivity analysis of computer simulation models

Mathematics and Computers in Simulation
Sensitivity analysis and performance extrapolation for computer simulation models

Operations Research
Importance sampling for stochastic simulations

Management Science
Optimization of stochastic systems via simulation

WSC '89 Proceedings of the 21st conference on Winter simulation
Sensitivity analysis via likelihood ratios

WSC '86 Proceedings of the 18th conference on Winter simulation
Stochastic approximation for Monte Carlo optimization

WSC '86 Proceedings of the 18th conference on Winter simulation
Likelilood ratio gradient estimation: an overview

WSC '87 Proceedings of the 19th conference on Winter simulation
On the role of generalized semi-Markov processes in simulation output analysis

WSC '83 Proceedings of the 15th conference on Winter simulation - Volume 1

Derivatives of likelihood ratios and smoothed perturbation analysis for the routing problem

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Fast simulation methods for highly dependable systems

WSC '94 Proceedings of the 26th conference on Winter simulation
Two approaches for estimating the gradient in functional form

WSC '93 Proceedings of the 25th conference on Winter simulation
Computational efficiency evaluation in output analysis

Proceedings of the 29th conference on Winter simulation
A review of simulation optimization techniques

Proceedings of the 30th conference on Winter simulation
Adaptive stochastic manpower scheduling

Proceedings of the 30th conference on Winter simulation
Exploiting multiple regeneration sequences in simulation output analysis

Proceedings of the 30th conference on Winter simulation
An overview of derivative estimation

WSC '91 Proceedings of the 23rd conference on Winter simulation
Gradient estimation for ratios

WSC '91 Proceedings of the 23rd conference on Winter simulation
Comparing alternative methods for derivative estimation when IPA does not apply directly

WSC '91 Proceedings of the 23rd conference on Winter simulation
On the small-sample optimality of multiple-regeneration estimators

Proceedings of the 31st conference on Winter simulation: Simulation---a bridge to the future - Volume 1
Regenerative steady-state simulation of discrete-event systems

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Combining the Stochastic Counterpart and Stochastic ApproximationMethods

Discrete Event Dynamic Systems
Functional Estimation with Respect to a Threshold Parametervia Dynamic Split-and-Merge

Discrete Event Dynamic Systems
Estimation Methods for Nonregenerative Stochastic Petri Nets

IEEE Transactions on Software Engineering
SIMULATION OF PROCESSES WITH MULTIPLE REGENERATION SEQUENCES

Probability in the Engineering and Informational Sciences
Output analysis: simulation output analysis

Proceedings of the 34th conference on Winter simulation: exploring new frontiers
Adaptive monte carlo methods for rare event simulation: adaptive monte carlo methods for rare event simulations

Proceedings of the 34th conference on Winter simulation: exploring new frontiers
Recent advances in simulation optimization: confidence regions for stochastic approximation algorithms

Proceedings of the 34th conference on Winter simulation: exploring new frontiers
Productivity improvement: throughput sensitivity analysis using a single simulation

Proceedings of the 34th conference on Winter simulation: exploring new frontiers
Output analysis: analysis of simulation output

Proceedings of the 35th conference on Winter simulation: driving innovation
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
The semi-regenerative method of simulation output analysis

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Sensitivity analysis for transient single server queuing models using an interpolation approach

WSC '04 Proceedings of the 36th conference on Winter simulation
Output analysis for simulations

Proceedings of the 38th conference on Winter simulation
Measure-Valued Differentiation for Stationary Markov Chains

Mathematics of Operations Research
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
Optimal parameter trajectory estimation in parameterized SDEs: An algorithmic procedure

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Statistical analysis of simulation output

Proceedings of the 40th Conference on Winter Simulation
Simulating Sensitivities of Conditional Value at Risk

Management Science
Estimating Quantile Sensitivities

Operations Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Quantile Sensitivity Estimation

NET-COOP '09 Proceedings of the 3rd Euro-NF Conference on Network Control and Optimization
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
A brief introduction to optimization via simulation

Winter Simulation Conference
Efficient gradient estimation using finite differencing and likelihood ratios for kinetic Monte Carlo simulations

Journal of Computational Physics
Stochastic Trust-Region Response-Surface Method STRONG---A New Response-Surface Framework for Simulation Optimization

INFORMS Journal on Computing
Modeling and simulation for product design process

Simulation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Consider a computer system having a CPU that feeds jobs to two input/output (I/O) devices having different speeds. Let &thgr; be the fraction of jobs routed to the first I/O device, so that 1 - &thgr; is the fraction routed to the second. Suppose that &agr; = &agr;(&thgr;) is the steady-sate amount of time that a job spends in the system. Given that &thgr; is a decision variable, a designer might wish to minimize &agr;(&thgr;) over &thgr;. Since &agr;(·) is typically difficult to evaluate analytically, Monte Carlo optimization is an attractive methodology. By analogy with deterministic mathematical programming, efficient Monte Carlo gradient estimation is an important ingredient of simulation-based optimization algorithms. As a consequence, gradient estimation has recently attracted considerable attention in the simulation community. It is our goal, in this article, to describe one efficient method for estimating gradients in the Monte Carlo setting, namely the likelihood ratio method (also known as the efficient score method). This technique has been previously described (in less general settings than those developed in this article) in [6, 16, 18, 21]. An alternative gradient estimation procedure is infinitesimal perturbation analysis; see [11, 12] for an introduction. While it is typically more difficult to apply to a given application than the likelihood ratio technique of interest here, it often turns out to be statistically more accurate. In this article, we first describe two important problems which motivate our study of efficient gradient estimation algorithms. Next, we will present the likelihood ratio gradient estimator in a general setting in which the essential idea is most transparent. The section that follows then specializes the estimator to discrete-time stochastic processes. We derive likelihood-ratio-gradient estimators for both time-homogeneous and non-time homogeneous discrete-time Markov chains. Later, we discuss likelihood ratio gradient estimation in continuous time. As examples of our analysis, we present the gradient estimators for time-homogeneous continuous-time Markov chains; non-time homogeneous continuous-time Markov chains; semi-Markov processes; and generalized semi-Markov processes. (The analysis throughout these sections assumes the performance measure that defines &agr;(&thgr;) corresponds to a terminating simulation.) Finally, we conclude the article with a brief discussion of the basic issues that arise in extending the likelihood ratio gradient estimator to steady-state performance measures.