An information-spectrum approach to analysis of return maximization in reinforcement learning

Authors:
Kazunori Iwata
Affiliations:
Graduate School of Information Sciences, Hiroshima City University, Hiroshima, Japan
Venue:
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Year:
2010

Citing 10
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
The role of the asymptotic equipartition property in noiseless source coding

IEEE Transactions on Information Theory
The method of types [information theory]

IEEE Transactions on Information Theory
A statistical property of multiagent learning based on Markov decision process

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In reinforcement learning, Markov decision processes are the most popular stochastic sequential decision processes.We frequently assume stationarity or ergodicity, or both to the process for its analysis, but most stochastic sequential decision processes arising in reinforcement learning are in fact, not necessarily Markovian, stationary, or ergodic. In this paper, we give an information-spectrum analysis of return maximization in more general processes than stationary or ergodic Markov decision processes.We also present a class of stochastic sequential decision processes with the necessary condition for return maximization.We provide several examples of best sequences in terms of return maximization in the class.