An empirical analysis of value function-based and policy search reinforcement learning

Authors:
Shivaram Kalyanakrishnan;Peter Stone
Affiliations:
The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2009

Citing 13
Cited 4

Technical Note: \cal Q-Learning

Machine Learning
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Behavior transfer for value-function-based reinforcement learning

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Learning tetris using the noisy cross-entropy method

Neural Computation
Evolutionary Function Approximation for Reinforcement Learning

The Journal of Machine Learning Research
On the use of hybrid reinforcement learning for autonomic resource allocation

Cluster Computing
Natural Actor-Critic

Neurocomputing
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Evolving multi-modal behavior in NPCs

CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Autonomous Agents and Multi-Agent Systems
Sustaining behavioral diversity in NEAT

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Reinforcement learning through global stochastic search in N-MDPs

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In several agent-oriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in long-term benefit. Reinforcement Learning (or sequential decision making) is a paradigm well-suited to this requirement. Value function-based methods and policy search methods are contrasting approaches to solve reinforcement learning tasks. While both classes of methods benefit from independent theoretical analyses, these often fail to extend to the practical situations in which the methods are deployed. We conduct an empirical study to examine the strengths and weaknesses of these approaches by introducing a suite of test domains that can be varied for problem size, stochasticity, function approximation, and partial observability. Our results indicate clear patterns in the domain characteristics for which each class of methods excels. We investigate whether their strengths can be combined, and develop an approach to achieve that purpose. The effectiveness of this approach is also demonstrated on the challenging benchmark task of robot soccer Keepaway. We highlight several lines of inquiry that emanate from this study.