A new criterion using information gain for action selection strategy in reinforcement learning

  • Authors:
  • K. Iwata;K. Ikeda;H. Sakai

  • Affiliations:
  • Graduate Sch. of Informatics, Kyoto Univ., Japan;-;-

  • Venue:
  • IEEE Transactions on Neural Networks
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe ℓ-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information and give a convergence proof of the information gain. Using the information gain, we propose the ratio ω of return loss to information gain as a new criterion to be used in probabilistic action-selection strategies. In experimental results, we found that our ω-based strategy performs well compared with the conventional Q-based strategy.