Technical Note: \cal Q-Learning
Machine Learning
The Convergence of TD(λ) for General λ
Machine Learning
Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Coding Theorems of Information Theory
Coding Theorems of Information Theory
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Statistical inference under multiterminal rate restrictions: a differential geometric approach
IEEE Transactions on Information Theory
The method of types [information theory]
IEEE Transactions on Information Theory
A lower bound for discrimination information in terms of variation (Corresp.)
IEEE Transactions on Information Theory
Reliability function of a discrete memoryless channel at rates above capacity (Corresp.)
IEEE Transactions on Information Theory
The error exponent for the noiseless encoding of finite ergodic Markov sources
IEEE Transactions on Information Theory
Universal coding with minimum probability of codeword length overflow
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
A new criterion using information gain for action selection strategy in reinforcement learning
IEEE Transactions on Neural Networks
Information Geometry and Information Theory in Machine Learning
Neural Information Processing
An Information-Theoretic Class of Stochastic Decision Processes
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
An information-spectrum approach to analysis of return maximization in reinforcement learning
ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
Stochastic processes for return maximization in reinforcement learning
ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Hi-index | 0.00 |
We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.