Dynamic preferences in multi-criteria reinforcement learning

Authors:
Sriraam Natarajan;Prasad Tadepalli
Affiliations:
Oregon State University;Oregon State University
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 15
Cited 13

Constrained Markov decision models with weighted discounted rewards

Mathematics of Operations Research
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Model-based average reward reinforcement learning

Artificial Intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Computer Networking: A Top-Down Approach Featuring the Internet

Computer Networking: A Top-Down Approach Featuring the Internet
Learning an Agent's Utility Function by Observing Behavior

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Multi-Agent Policy-Gradient Approach to Network Routing

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multi-criteria Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
TPOT-RL Applied to Network Routing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks

PERCOM '03 Proceedings of the First IEEE International Conference on Pervasive Computing and Communications
A Geometric Approach to Multi-Criterion Reinforcement Learning

The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Flexible decomposition algorithms for weakly coupled Markov decision problems

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Learning all optimal policies with multiple criteria

Proceedings of the 25th international conference on Machine learning
Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin

Neural Information Processing
Transfer in variable-reward hierarchical reinforcement learning

Machine Learning
On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Using Reinforcement Learning for Multi-policy Optimization in Decentralized Autonomic Systems --- An Experimental Evaluation

ATC '09 Proceedings of the 6th International Conference on Autonomic and Trusted Computing
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
A Human-Robot Collaborative Reinforcement Learning Algorithm

Journal of Intelligent and Robotic Systems
Multi-policy optimization in self-organizing systems

SOAR'09 Proceedings of the First international conference on Self-organizing architectures
An empirical comparison of two common multiobjective reinforcement learning algorithms

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Linear fitted-Q iteration with multiple reward functions

The Journal of Machine Learning Research
Online learning of timeout policies for dynamic power management

ACM Transactions on Embedded Computing Systems (TECS)
Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework

Engineering Applications of Artificial Intelligence
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent's preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan's ass problem and network routing.