A new optimality criterion for nonhomogeneous Markov decision processes
Operations Research
Technical Note: \cal Q-Learning
Machine Learning
The Convergence of TD(λ) for General λ
Machine Learning
Learning in embedded systems
TD-Gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
A reinforcement learning approach to online clustering
Neural Computation
Control of exploitation-exploration meta-parameter in reinforcement learning
Neural Networks - Computational models of neuromodulation
Neural Computation
Reinforcement Learning in Continuous Time and Space
Neural Computation
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
The role of the asymptotic equipartition property in noiseless source coding
IEEE Transactions on Information Theory
The method of types [information theory]
IEEE Transactions on Information Theory
Approximation theory of output statistics
IEEE Transactions on Information Theory
Near-optimal reinforcement learning framework for energy-aware sensor communications
IEEE Journal on Selected Areas in Communications
A new criterion using information gain for action selection strategy in reinforcement learning
IEEE Transactions on Neural Networks
A statistical property of multiagent learning based on Markov decision process
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.