Technical Note: \cal Q-Learning
Machine Learning
PALO: a probabilistic hill-climbing algorithm
Artificial Intelligence
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Discovering Hierarchy in Reinforcement Learning with HEXQ
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
The MAXQ Method for Hierarchical Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Relational Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning Options in Reinforcement Learning
Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Using relative novelty to identify useful temporal abstractions in reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
An object-oriented representation for efficient reinforcement learning
Proceedings of the 25th international conference on Machine learning
Generalizing plans to new environments in relational MDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Multiple-goal reinforcement learning with modular Sarsa(O)
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Efficient skill learning using abstraction selection
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Intrinsically Motivated Hierarchical Skill Learning in Structured Environments
IEEE Transactions on Autonomous Mental Development
Hi-index | 0.00 |
We present Object Focused Q-learning (OF-Q), a novel reinforcement learning algorithm that can offer exponential speed-ups over classic Q-learning on domains composed of independent objects. An OF-Q agent treats the state space as a collection of objects organized into different object classes. Our key contribution is a control policy that uses non-optimal Q-functions to estimate the risk of ignoring parts of the state space. We compare our algorithm to traditional Q-learning and previous arbitration algorithms in two domains, including a version of Space Invaders.