Reinforcement Learning in Fine Time Discretization

Authors:
Paweł Wawrzyński
Affiliations:
Institute of Control and Computation Engineering, Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
Venue:
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Year:
2007

Citing 8
Cited 0

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Technical Note: \cal Q-Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Policy Gradient in Continuous Time

The Journal of Machine Learning Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning (RL) is analyzed here as a tool for control system optimization. State and action spaces are assumed to be continuous. Time is assumed to be discrete, yet the discretization may be arbitrarily fine. It is shown here that stationary policies, applied by most RL methods, are improper in control applications, since for fine time discretization they can not assure bounded variance of policy gradient estimators. As a remedy to that difficulty, we propose the use of piecewise non-Markov policies. Policies of this type can be optimized by means of most RL algorithms, namely those based on likelihood ratio.