Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning cost-sensitive active classifiers
Artificial Intelligence
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
The Journal of Machine Learning Research
Proceedings of the 25th international conference on Machine learning
Provably bounded-optimal agents
Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Datum-wise classification: a sequential approach to sparsity
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Hi-index | 0.00 |
An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross-Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.