An information-theoretic analysis of return maximization in reinforcement learning

Authors:
Kazunori Iwata
Affiliations:
-
Venue:
Neural Networks
Year:
2011

Citing 22
Cited 0

A new optimality criterion for nonhomogeneous Markov decision processes

Operations Research
Technical Note: \cal Q-Learning

Machine Learning
The Convergence of TD(λ) for General λ

Machine Learning
Learning in embedded systems

Learning in embedded systems
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
A reinforcement learning approach to online clustering

Neural Computation
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Robust Reinforcement Learning

Neural Computation
Reinforcement Learning in Continuous Time and Space

Neural Computation
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Neural Computation
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
The role of the asymptotic equipartition property in noiseless source coding

IEEE Transactions on Information Theory
The method of types [information theory]

IEEE Transactions on Information Theory
Approximation theory of output statistics

IEEE Transactions on Information Theory
Near-optimal reinforcement learning framework for energy-aware sensor communications

IEEE Journal on Selected Areas in Communications
A new criterion using information gain for action selection strategy in reinforcement learning

IEEE Transactions on Neural Networks
A statistical property of multiagent learning based on Markov decision process

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.