Speeding up reinforcement learning using recurrent neural networks in non-Markovian environments

  • Authors:
  • LE Tien Dung;Takashi Komeda;Motoki Takagi

  • Affiliations:
  • Shibaura Institute of Technology, Minumaku, Saitama, Japan;Shibaura Institute of Technology, Minumaku, Saitama, Japan;Shibaura Institute of Technology, Minumaku, Saitama, Japan

  • Venue:
  • ASC '07 Proceedings of The Eleventh IASTED International Conference on Artificial Intelligence and Soft Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement Learning (RL) has been widely used to solve problems with a little feedback from environment. Q learning can solve Markov Decision Processes quite well. For Partially Observable Markov Decision Processes, a Recurrent Neural Network (RNN) can be used to approximate Q values. However, learning time for these problems is typically very long. In this paper, we present a method to speed up learning performance in non-Markovian environments by focusing on necessary state-action pairs in learning episodes. Whenever the agent can attain the goal, the agent checks the episode and relearns necessary actions. We use a table, storing minimum number of appearances of states in all successful episodes, to remove unnecessary state-action pairs in a successful episode and to form a min-episode. To verify this method, we performed two experiments: The E maze problem with Time-delay Neural Network and the lighting grid world problem with Long Short Term Memory RNN. Experimental results show that the proposed method enables an agent to acquire a policy with better learning performance compared to the standard method.