Applications of random sampling in computational geometry, II
Discrete & Computational Geometry - Selected papers from the fourth ACM symposium on computational geometry, Univ. of Illinois, Urbana-Champaign, June 6 8, 1988
Constrained Markov decision models with weighted discounted rewards
Mathematics of Operations Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Multi-criteria Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Dynamic Programming
A Geometric Approach to Multi-Criterion Reinforcement Learning
The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic preferences in multi-criteria reinforcement learning
ICML '05 Proceedings of the 22nd international conference on Machine learning
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Multi-policy optimization in self-organizing systems
SOAR'09 Proceedings of the First international conference on Self-organizing architectures
Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)
Proceedings of the 13th annual conference on Genetic and evolutionary computation
An empirical comparison of two common multiobjective reinforcement learning algorithms
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Linear fitted-Q iteration with multiple reward functions
The Journal of Machine Learning Research
A survey of multi-objective sequential decision-making
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear preference assignments over the multiple reward criteria at once. The algorithm can be viewed as an extension to standard reinforcement learning for MDPs where instead of repeatedly backing up maximal expected rewards, we back up the set of expected rewards that are maximal for some set of linear preferences (given by a weight vector, w). We present the algorithm along with a proof of correctness showing that our solution gives the optimal policy for any linear preference function. The solution reduces to the standard value iteration algorithm for a specific weight vector, w.