Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Optimizing Average Reward Using Discounted Rewards
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Self-Optimizing and Pareto-Optimal Policies in General Environments Based on Bayes-Mixtures
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Universal Intelligence: A Definition of Machine Intelligence
Minds and Machines
On the possibility of learning in reactive environments with arbitrary dependence
Theoretical Computer Science
Measuring universal intelligence: Towards an anytime intelligence test
Artificial Intelligence
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
General time consistent discounting
Theoretical Computer Science
Hi-index | 0.00 |
Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m→∞ and V for k→∞ are equal, provided both limits exist. Further, if the effective horizon grows linearly with k or faster, then the existence of the limit of U implies that the limit of V exists. Conversely, if the effective horizon grows linearly with k or slower, then existence of the limit of V implies that the limit of U exists.