Technical Note: \cal Q-Learning
Machine Learning
The Convergence of TD(λ) for General λ
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
The role of the asymptotic equipartition property in noiseless source coding
IEEE Transactions on Information Theory
The method of types [information theory]
IEEE Transactions on Information Theory
A statistical property of multiagent learning based on Markov decision process
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
In reinforcement learning, Markov decision processes are the most popular stochastic sequential decision processes.We frequently assume stationarity or ergodicity, or both to the process for its analysis, but most stochastic sequential decision processes arising in reinforcement learning are in fact, not necessarily Markovian, stationary, or ergodic. In this paper, we give an information-spectrum analysis of return maximization in more general processes than stationary or ergodic Markov decision processes.We also present a class of stochastic sequential decision processes with the necessary condition for return maximization.We provide several examples of best sequences in terms of return maximization in the class.