Learning all optimal policies with multiple criteria

Authors:
Leon Barrett;Srini Narayanan
Affiliations:
Berkeley, CA;Berkeley, CA
Venue:
Proceedings of the 25th international conference on Machine learning
Year:
2008

Citing 10
Cited 6

Applications of random sampling in computational geometry, II

Discrete & Computational Geometry - Selected papers from the fourth ACM symposium on computational geometry, Univ. of Illinois, Urbana-Champaign, June 6 8, 1988
Constrained Markov decision models with weighted discounted rewards

Mathematics of Operations Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multi-criteria Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Dynamic Programming

Dynamic Programming
A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic preferences in multi-criteria reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning

Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Multi-policy optimization in self-organizing systems

SOAR'09 Proceedings of the First international conference on Self-organizing architectures
Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)

Proceedings of the 13th annual conference on Genetic and evolutionary computation
An empirical comparison of two common multiobjective reinforcement learning algorithms

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Linear fitted-Q iteration with multiple reward functions

The Journal of Machine Learning Research
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe an algorithm for learning in the presence of multiple criteria. Our technique generalizes previous approaches in that it can learn optimal policies for all linear preference assignments over the multiple reward criteria at once. The algorithm can be viewed as an extension to standard reinforcement learning for MDPs where instead of repeatedly backing up maximal expected rewards, we back up the set of expected rewards that are maximal for some set of linear preferences (given by a weight vector, w). We present the algorithm along with a proof of correctness showing that our solution gives the optimal policy for any linear preference function. The solution reduces to the standard value iteration algorithm for a specific weight vector, w.