Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem
Machine Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Efficient approximate policy iteration methods for sequential decision making in reinforcement learning
Relating reinforcement learning performance to classification performance
ICML '05 Proceedings of the 22nd international conference on Machine learning
The Journal of Machine Learning Research
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Recent Advances in Reinforcement Learning
Journal of Artificial Intelligence Research
Bandit based monte-carlo planning
ECML'06 Proceedings of the 17th European conference on Machine Learning
ECML'05 Proceedings of the 16th European conference on Machine Learning
An approach to fuzzy control of nonlinear systems: stability and design issues
IEEE Transactions on Fuzzy Systems
Rollout Sampling Approximate Policy Iteration
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Recent Advances in Reinforcement Learning
Preference-based policy iteration: leveraging preference learning for reinforcement learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A tour of machine learning: An AI perspective
AI Communications - ECAI 2012 Turing and Anniversary Track
Hi-index | 0.00 |
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.