Real-time obstacle avoidance for manipulators and mobile robots
International Journal of Robotics Research
Technical Note: \cal Q-Learning
Machine Learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Using inaccurate models in reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Hi-index | 0.00 |
Many obstacles and paths exist in a real environment, and hence, it is difficult for an agent to learn such an environment. Q-learning is suitable in such cases because it does not define any learning model. By Q-learning, an agent learns to reach a state wherein it can receive a reward for selecting an action. However, no information on how to receive a reward is available. In the initial learning stage, an agent sometimes selects an action that makes it move to a state wherein it cannot receive a reward. Hence, the learning time and learning cost to select an optimal action is increased. In order to assist an agent to learn by Q-learning, if a model is created automatically, both the problems of time and cost can be solved together. In this paper, we propose a method that creates such a model automatically by dynamic programming. This model causes an agent be able to notice when it is close to the state wherein it can receive a reward. An agent is driven by the model to the state wherein it can receive a reward when it enters the field of reward affection. By conducting experiments, we also compared the success rate between conventional Q-learning and the proposed method. Our results showed that the success rate of the proposed method was approximately 187% higher than that of Q-learning.