Reinforcement learning through global stochastic search in N-MDPs

Authors:
Matteo Leonetti;Luca Iocchi;Subramanian Ramamoorthy
Affiliations:
Department of Computer and System Sciences, Sapienza University of Rome, Rome, Italy;Department of Computer and System Sciences, Sapienza University of Rome, Rome, Italy;School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Year:
2011

Citing 14
Cited 1

Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Introduction to Stochastic Search and Optimization

Introduction to Stochastic Search and Optimization
Reinforcement learning for POMDPs based on action values and stochastic optimization

Eighteenth national conference on Artificial intelligence
Evolving Soccer Keepaway Players Through Task Decomposition

Machine Learning
Transfer via inter-task mappings in policy search reinforcement learning

Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
An empirical analysis of value function-based and policy search reinforcement learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Improving the performance of complex agent plans through reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Learning complementary multiagent behaviors: a case study

RoboCup 2009

Induction and learning of finite-state controllers from simulation

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a way that a Markovian representation would be computationally very expensive. An alternative formulation of the decision problem involves partially specified behaviors with choice points. While this reduces the complexity of the policy space that must be explored - something that is crucial for realistic autonomous agents that must bound search time - it does render the domain Non-Markovian. In this paper, we present a novel algorithm for reinforcement learning in Non-Markovian domains. Our algorithm, Stochastic Search Monte Carlo, performs a global stochastic search in policy space, shaping the distribution from which the next policy is selected by estimating an upper bound on the value of each action. We experimentally show how, in challenging domains for RL, high-level decisions in Non-Markovian processes can lead to a behavior that is at least as good as the one learned by traditional algorithms, and can be achieved with significantly fewer samples.