Automatic Discovery of Subgoals in Reinforcement Learning Using Unique-Dreiction Value

  • Authors:
  • Chuan Shi;Rui Huang;Zhongzhi Shi

  • Affiliations:
  • Institute of Computing and Technology, Chinese Academy of Sciences, 100080/ Graduate School of the Chinese Academy of Sciences, 100039/ Beijing University of Posts and Telecommunic;Institute of Computing and Technology, Chinese Academy of Sciences, 100080/ Graduate School of the Chinese Academy of Sciences, 100039, huangr@ics.ict.ac.cn;Institute of Computing and Technology, Chinese Academy of Sciences, 100080, shizz@ics.ict.ac.cn

  • Venue:
  • COGINF '07 Proceedings of the 6th IEEE International Conference on Cognitive Informatics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Option has proven useful in discovering hierarchical structure in reinforcement learning to fasten learning. The key problem of automatic option discovery is to find subgoals. Though approaches based on visiting-frequency have gained much research focuses, many of them fail to distinguish subgoals from their nearby states. Based on the action-restricted property of subgoals we find, subgoals can be regarded as the most matching action-restricted states in the paths. For the grid-world environment, the concept of unique-direction value embodying the action-restricted property is introduced to find the most matching action-restricted states. Experiment results prove that the proposed approach can find subgoals correctly and the Q-learning with options found speed up the learning greatly.