Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
An introduction to computational learning theory
An introduction to computational learning theory
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Reinforcement Learning
Machine Learning
Neuro-Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
Expected Mistake Bound Model for On-Line Reinforcement Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Operations Management
Discrete Event Dynamic Systems
Customized learning algorithms for episodic tasks withacyclic state spaces
CASE'09 Proceedings of the fifth annual IEEE international conference on Automation science and engineering
Optimal flow control in acyclic networks with uncontrollable routings and precedence constraints
Discrete Event Dynamic Systems
Hi-index | 0.00 |
This paper considers the problem of computing an optimal policy for a Markov decision process, under lack of complete a priori knowledge of (1) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (2) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer's "indifference-zone" approach for the ranking & selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.