Learning by Automatic Option Discovery from Conditionally Terminating Sequences

Authors:
Sertan Girgin;Faruk Polat;Reda Alhajj
Affiliations:
Middle East Technical University, Ankara, Turkey/ and University of Calgary, Calgary, Alberta, Canada, girgins@cpsc.ucalgary.ca;Middle East Technical University, Ankara, Turkey, polat@ceng.metu.edu.tr;University of Calgary, Calgary, Alberta, Canada/ and Global University, Beirut, Lebanon, alhajj@cpsc.ucalgary.ca
Venue:
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Year:
2006

Citing 10
Cited 2

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning Options in Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Autonomous discovery of temporal abstractions from interaction with an environment

Autonomous discovery of temporal abstractions from interaction with an environment
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

A layered approach to learning coordination knowledge in multiagent environments

Applied Intelligence
State similarity based approach for improving performance in RL

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel approach to discover options in the form of conditionally terminating sequences, and shows how they can be integrated into reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure online in order to identify action sequences which are used frequently together with states that are visited during the execution of such sequences. The tree is then used to implicitly run corresponding options. Effectiveness of the method is demonstrated empirically.