Learning when to stop thinking and do something!

Authors:
Barnabás Póczos;Yasin Abbasi-Yadkori;Csaba Szepesvári;Russell Greiner;Nathan Sturtevant
Affiliations:
University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada;University of Alberta, Edmonton, AB, Canada
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 7
Cited 1

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning cost-sensitive active classifiers

Artificial Intelligence
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
Empirical Bernstein stopping

Proceedings of the 25th international conference on Machine learning
Provably bounded-optimal agents

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Datum-wise classification: a sequential approach to sparsity

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross-Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.