Technical Note: \cal Q-Learning
Machine Learning
Acting optimally in partially observable stochastic domains
AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-squares policy iteration
The Journal of Machine Learning Research
Behavior transfer for value-function-based reinforcement learning
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Comparing evolutionary and temporal difference methods in a reinforcement learning domain
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Learning tetris using the noisy cross-entropy method
Neural Computation
Evolutionary Function Approximation for Reinforcement Learning
The Journal of Machine Learning Research
Neurocomputing
Machine learning for fast quadrupedal locomotion
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Evolving multi-modal behavior in NPCs
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Autonomous Agents and Multi-Agent Systems
Sustaining behavioral diversity in NEAT
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Reinforcement learning through global stochastic search in N-MDPs
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Hi-index | 0.00 |
In several agent-oriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in long-term benefit. Reinforcement Learning (or sequential decision making) is a paradigm well-suited to this requirement. Value function-based methods and policy search methods are contrasting approaches to solve reinforcement learning tasks. While both classes of methods benefit from independent theoretical analyses, these often fail to extend to the practical situations in which the methods are deployed. We conduct an empirical study to examine the strengths and weaknesses of these approaches by introducing a suite of test domains that can be varied for problem size, stochasticity, function approximation, and partial observability. Our results indicate clear patterns in the domain characteristics for which each class of methods excels. We investigate whether their strengths can be combined, and develop an approach to achieve that purpose. The effectiveness of this approach is also demonstrated on the challenging benchmark task of robot soccer Keepaway. We highlight several lines of inquiry that emanate from this study.