On-Line Policy Gradient Estimation with Multi-Step Sampling

  • Authors:
  • Yan-Jie Li;Fang Cao;Xi-Ren Cao

  • Affiliations:
  • Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong and Division of Control and Mechatronics Engineering, Harbin Institute of ...;Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong and School of Electronics and Information Engineering, Beijing Jiaotong Un ...;Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Kowloon, Hong Kong

  • Venue:
  • Discrete Event Dynamic Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this note, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed and propose algorithms with multi-step sampling for performance gradient estimates; these algorithms do not require the standard assumption. Simulation examples are given to illustrate the accuracy of the estimates.