A study of Q-learning considering negative rewards

  • Authors:
  • Takayasu Fuchida;Kathy Thi Aung;Atsushi Sakuragi

  • Affiliations:
  • Graduate School of Science and Engineering, Kagoshima University, Kagoshima, Japan 890-0065;Graduate School of Science and Engineering, Kagoshima University, Kagoshima, Japan 890-0065;Graduate School of Science and Engineering, Kagoshima University, Kagoshima, Japan 890-0065

  • Venue:
  • Artificial Life and Robotics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the reinforcement learning system, the agent obtains a positive reward, such as 1, when it achieves its goal. Positive rewards are propagated around the goal area, and the agent gradually succeeds in reaching its goal. If you want to avoid certain situations, such as dangerous places or poison, you might want to give a negative reward to the agent. However, in conventional Q-learning, negative rewards are not propagated in more than one state. In this article, we propose a new way to propagate negative rewards. This is a very simple and efficient technique for Q-learning. Finally, we show the results of computer simulations and the effectiveness of the proposed method.