Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Self-Optimizing and Pareto-Optimal Policies in General Environments Based on Bayes-Mixtures
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Universal Intelligence: A Definition of Machine Intelligence
Minds and Machines
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
General discounting versus average reward
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Hi-index | 5.23 |
Modeling inter-temporal choice is a key problem in both computer science and economic theory. The discounted utility model of Samuelson is currently the most popular model for measuring the global utility of a time-series of local utilities. The model is limited by not allowing the discount function to change with the age of the agent. This is despite the fact that many agents, in particular humans, are best modelled with age-dependent discount functions. It is well known that discounting can lead to time-inconsistent behaviour where agents change their preferences over time. In this paper we generalise the discounted utility model to allow age-dependent discount functions. We then extend previous work in time-inconsistency to our new setting, including a complete characterisation of time-(in)consistent discount functions, the existence of sub-game perfect equilibrium policies where the discount function is time-inconsistent and a continuity result showing that ''nearly'' time-consistent discount rates lead to ''nearly'' time-consistent behaviour.