Efficient sampling strategies for relational database operations
ICDT Selected papers of the 4th international conference on Database theory
An introduction to computational learning theory
An introduction to computational learning theory
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
From Computational Learning Theory to Discovery Science
ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Practical Algorithms for On-line Sampling
DS '98 Proceedings of the First International Conference on Discovery Science
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Efficient reinforcement learning in factored MDPs
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Sequential Sampling Techniques for Algorithmic Learning Theory
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Hi-index | 0.00 |
Recently, Kearns and Singh presented the first provably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of the underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time needed to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorithm might perform better in practice than the previous one.