Direct Policy Search and Uncertain Policy Evaluation

Authors:
Juergen Schmidhuber;Jieyu Zhao
Affiliations:
-;-
Venue:
Direct Policy Search and Uncertain Policy Evaluation
Year:
1998

Citing 0
Cited 1

Policy search using paired comparisons

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning based on direct search in policy space requires few assumptions about the environment. Hence it is applicable in certain situations where most traditional reinforcement learning algorithms are not, especially in partially observable, deterministic worlds. In realistic settings, however, reliable policy evaluations are complicated by numerous sources of uncertainty, such as stochasticity in policy and environment. Given a limited life-time, how much time should a direct policy searcher spend on policy evaluations to obtain reliable statistics? Our efficient approach based on the success-story algorithm (SSA) is radical in the sense that it never stops evaluating any previous policy modification except those it undoes for lack of empirical evidence that they have contributed to lifelong reward accelerations. While previous experimental research has already demonstrated SSA''s applicability to large-scale partially observable environments, a study of why it performs well has been lacking. Here we identify for the first time SSA''s fundamental advantages over traditional direct policy search (such as stochastic hill-climbing) on problems involving several sources of stochasticity and uncertainty.