A reward field model generation in Q-learning by dynamic programming

Authors:
Yunsick Sung;Kyungeun Cho;Kyhyun Um
Affiliations:
Dongguk University, Seoul, Korea;Dongguk University, Seoul, Korea;Dongguk University, Seoul, Korea
Venue:
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Year:
2009

Citing 5
Cited 0

Real-time obstacle avoidance for manipulators and mobile robots

International Journal of Robotics Research
Technical Note: \cal Q-Learning

Machine Learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Using inaccurate models in reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many obstacles and paths exist in a real environment, and hence, it is difficult for an agent to learn such an environment. Q-learning is suitable in such cases because it does not define any learning model. By Q-learning, an agent learns to reach a state wherein it can receive a reward for selecting an action. However, no information on how to receive a reward is available. In the initial learning stage, an agent sometimes selects an action that makes it move to a state wherein it cannot receive a reward. Hence, the learning time and learning cost to select an optimal action is increased. In order to assist an agent to learn by Q-learning, if a model is created automatically, both the problems of time and cost can be solved together. In this paper, we propose a method that creates such a model automatically by dynamic programming. This model causes an agent be able to notice when it is close to the state wherein it can receive a reward. An agent is driven by the model to the state wherein it can receive a reward when it enters the field of reward affection. By conducting experiments, we also compared the success rate between conventional Q-learning and the proposed method. Our results showed that the success rate of the proposed method was approximately 187% higher than that of Q-learning.