Effectiveness of considering state similarity for reinforcement learning

Authors:
Sertan Girgin;Faruk Polat;Reda Alhajj
Affiliations:
Department of Computer Eng., Middle East Technical University, Ankara, Turkey;Department of Computer Eng., Middle East Technical University, Ankara, Turkey;Department of Computer Science, University of Calgary, Calgary, AB, Canada
Venue:
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Year:
2006

Citing 10
Cited 0

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning Options in Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach that locates states with similar sub-policies, and incorporates them into the reinforcement learning framework for better learning performance. This is achieved by identifying common action sequences of states, which are derived from possible optimal policies and reflected into a tree structure. Based on the number of such sequences, we define a similarity function between two states, which helps to reflect updates on the action-value function of a state to all similar states. This way, experience acquired during learning can be applied to a broader context. The effectiveness of the method is demonstrated empirically.