A reward field model generation in Q-learning by dynamic programming

  • Authors:
  • Yunsick Sung;Kyungeun Cho;Kyhyun Um

  • Affiliations:
  • Dongguk University, Seoul, Korea;Dongguk University, Seoul, Korea;Dongguk University, Seoul, Korea

  • Venue:
  • Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many obstacles and paths exist in a real environment, and hence, it is difficult for an agent to learn such an environment. Q-learning is suitable in such cases because it does not define any learning model. By Q-learning, an agent learns to reach a state wherein it can receive a reward for selecting an action. However, no information on how to receive a reward is available. In the initial learning stage, an agent sometimes selects an action that makes it move to a state wherein it cannot receive a reward. Hence, the learning time and learning cost to select an optimal action is increased. In order to assist an agent to learn by Q-learning, if a model is created automatically, both the problems of time and cost can be solved together. In this paper, we propose a method that creates such a model automatically by dynamic programming. This model causes an agent be able to notice when it is close to the state wherein it can receive a reward. An agent is driven by the model to the state wherein it can receive a reward when it enters the field of reward affection. By conducting experiments, we also compared the success rate between conventional Q-learning and the proposed method. Our results showed that the success rate of the proposed method was approximately 187% higher than that of Q-learning.