Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
An introduction to Kolmogorov complexity and its applications (2nd ed.)
An introduction to Kolmogorov complexity and its applications (2nd ed.)
A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
Artificial Intelligence
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Near-Optimal Reinforcement Learning in Polynominal Time
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
General Loss Bounds for Universal Sequence Prediction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Dynamic Programming
Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Optimality of universal Bayesian sequence prediction for general loss and alphabet
The Journal of Machine Learning Research
Optimal Ordered Problem Solver
Machine Learning
On the possibility of learning in reactive environments with arbitrary dependence
Theoretical Computer Science
A minimum relative entropy principle for learning and acting
Journal of Artificial Intelligence Research
Optimality issues of universal greedy agents with static priors
ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
A Monte-Carlo AIXI approximation
Journal of Artificial Intelligence Research
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
General discounting versus average reward
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Asymptotic learnability of reinforcement problems with arbitrary dependence
ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory
Asymptotic non-learnability of universal agents with computable horizon functions
Theoretical Computer Science
General time consistent discounting
Theoretical Computer Science
Hi-index | 0.00 |
The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle t action yt results in perception xt and reward rt, where all quantities in general may depend on the complete history. The perception xt and reward rt are sampled from the (reactive) environmental probability distribution 碌. This very general setting includes, but is not limited to, (partial observable, k-th order) Markov decision processes. Sequential decision theory tells us how to act in order to maximize the total expected reward, called value, if 碌 is known. Reinforcement learning is usually used if 碌 is unknown. In the Bayesian approach one defines a mixture distribution 驴 as a weighted sum of distributions 驴 驴M, where M is any class of distributions including the true environment 碌. We show that the Bayes-optimal policy p驴 based on the mixture 驴 is self-optimizing in the sense that the average value converges asymptotically for all 碌驴M to the optimal value achieved by the (infeasible) Bayes-optimal policy p碌 which knows 碌 in advance. We show that the necessary condition that M admits self-optimizing policies at all, is also sufficient. No other structural assumptions are made on M. As an example application, we discuss ergodic Markov decision processes, which allow for self-optimizing policies. Furthermore, we show that p驴 is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in all environments 驴 驴M and a strictly higher value in at least one.