Technical Note: \cal Q-Learning
Machine Learning
Efficient reinforcement learning
COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
The Journal of Machine Learning Research
A theoretical analysis of Model-Based Interval Estimation
ICML '05 Proceedings of the 22nd international conference on Machine learning
The utility of temporal abstraction in reinforcement learning
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
An analysis of model-based Interval Estimation for Markov Decision Processes
Journal of Computer and System Sciences
Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case
Recent Advances in Reinforcement Learning
Recent Advances in Reinforcement Learning
Learning and planning in environments with delayed feedback
Autonomous Agents and Multi-Agent Systems
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Near-Bayesian exploration in polynomial time
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Experiments with Adaptive Transfer Rate in Reinforcement Learning
Knowledge Acquisition: Approaches, Algorithms and Applications
Reinforcement using supervised learning for policy generalization
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Reinforcement Learning in Finite MDPs: PAC Analysis
The Journal of Machine Learning Research
REGAL: a regularization based algorithm for reinforcement learning in weakly communicating MDPs
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Near-optimal Regret Bounds for Reinforcement Learning
The Journal of Machine Learning Research
Reducing reinforcement learning to KWIK online regression
Annals of Mathematics and Artificial Intelligence
Applying possibility and belief operators to conditional statements
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
Predicting and compensating for lexicon access errors
Proceedings of the 16th international conference on Intelligent user interfaces
Journal of Artificial Intelligence Research
A Monte-Carlo AIXI approximation
Journal of Artificial Intelligence Research
Exploiting Best-Match Equations for Efficient Reinforcement Learning
The Journal of Machine Learning Research
Comparing humans and AI agents
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Preference-based policy learning
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Animation of open multi-agent systems
Proceedings of the 2011 Workshop on Agent-Directed Simulation
Evaluating a reinforcement learning algorithm with a general intelligence test
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Exploration strategies for learning in multi-agent foraging
SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
PAC bounds for discounted MDPs
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Adaptive probabilistic policy reuse
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Generation of tests for programming challenge tasks using multi-objective optimization
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Software quality assessment using a multi-strategy classifier
Information Sciences: an International Journal
Hi-index | 0.01 |
For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm---Delayed Q-Learning. We prove it is PAC, achieving near optimal performance except for Õ(SA) timesteps using O(SA) space, improving on the Õ(S2 A) bounds of best previous algorithms. This result proves efficient reinforcement learning is possible without learning a model of the MDP from experience. Learning takes place from a single continuous thread of experience---no resets nor parallel sampling is used. Beyond its smaller storage and experience requirements, Delayed Q-learning's per-experience computation cost is much less than that of previous PAC algorithms.