Rollout sampling approximate policy iteration

Authors:
Christos Dimitrakakis;Michail G. Lagoudakis
Affiliations:
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands 1098SJ;Department of Electronic and Computer Engineering, Technical University of Crete, Chania, Crete, Greece 73100
Venue:
Machine Learning
Year:
2008

Citing 12
Cited 5

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Efficient approximate policy iteration methods for sequential decision making in reinforcement learning

Efficient approximate policy iteration methods for sequential decision making in reinforcement learning
Relating reinforcement learning performance to classification performance

ICML '05 Proceedings of the 22nd international conference on Machine learning
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Recent Advances in Reinforcement Learning
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
An approach to fuzzy control of nonlinear systems: stability and design issues

IEEE Transactions on Fuzzy Systems

Rollout Sampling Approximate Policy Iteration

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration

Recent Advances in Reinforcement Learning
Preference-based policy iteration: leveraging preference learning for reinforcement learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
A tour of machine learning: An AI perspective

AI Communications - ECAI 2012 Turing and Anniversary Track

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervised learning problem. This paper proposes variants of an improved policy iteration scheme which addresses the core sampling problem in evaluating a policy through simulation as a multi-armed bandit machine. The resulting algorithm offers comparable performance to the previous algorithm achieved, however, with significantly less computational effort. An order of magnitude improvement is demonstrated experimentally in two standard reinforcement learning domains: inverted pendulum and mountain-car.