A new criterion using information gain for action selection strategy in reinforcement learning

Authors:
K. Iwata;K. Ikeda;H. Sakai
Affiliations:
Graduate Sch. of Informatics, Kyoto Univ., Japan;-;-
Venue:
IEEE Transactions on Neural Networks
Year:
2004

Citing 0
Cited 4

The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
An Information-Theoretic Class of Stochastic Decision Processes

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Stochastic processes for return maximization in reinforcement learning

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe ℓ-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information and give a convergence proof of the information gain. Using the information gain, we propose the ratio ω of return loss to information gain as a new criterion to be used in probabilistic action-selection strategies. In experimental results, we found that our ω-based strategy performs well compared with the conventional Q-based strategy.