The weighted majority algorithm
Information and Computation
Competitive Markov decision processes
Competitive Markov decision processes
Artificial Intelligence Review - Special issue on lazy learning
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Neuro-Dynamic Programming
The Nonstochastic Multiarmed Bandit Problem
SIAM Journal on Computing
The empirical Bayes envelope and regret minimization in competitive Markov decision processes
Mathematics of Operations Research
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
The Journal of Machine Learning Research
Efficient algorithms for online decision problems
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Prediction, Learning, and Games
Prediction, Learning, and Games
Robust Control of Markov Decision Processes with Uncertain Transition Matrices
Operations Research
Markov Decision Processes with Arbitrary Reward Processes
Mathematics of Operations Research
Reliable communication under channel uncertainty
IEEE Transactions on Information Theory
Hi-index | 0.00 |
We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies--i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-maker's observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.