Stochastic processes for return maximization in reinforcement learning

Authors:
Kazunori Iwata;Hideaki Sakai;Kazushi Ikeda
Affiliations:
Faculty of Information Sciences, Hiroshima City University, Hiroshima, Japan;Graduate School of Informatics, Kyoto University, Kyoto, Japan;Graduate School of Informatics, Kyoto University, Kyoto, Japan
Venue:
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Year:
2005

Citing 5
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
The role of the asymptotic equipartition property in noiseless source coding

IEEE Transactions on Information Theory
A new criterion using information gain for action selection strategy in reinforcement learning

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the framework of reinforcement learning, an agent learns an optimal policy via return maximization, not via the instructed choices by a supervisor. The framework is in general formulated as an ergodic Markov decision process and is designed by tuning some parameters of the action-selection strategy so that the learning process eventually becomes almost stationary. In this paper, we examine a theoretical class of more general processes such that the agent can achieve return maximization by considering the asymptotic equipartition property of such processes. As a result, we show several necessary conditions that the agent and the environment have to satisfy for possible return maximization.