An Information-Theoretic Class of Stochastic Decision Processes
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Stochastic processes for return maximization in reinforcement learning
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Hi-index | 0.00 |
In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe ℓ-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information and give a convergence proof of the information gain. Using the information gain, we propose the ratio ω of return loss to information gain as a new criterion to be used in probabilistic action-selection strategies. In experimental results, we found that our ω-based strategy performs well compared with the conventional Q-based strategy.