Constrained Markov decision models with weighted discounted rewards
Mathematics of Operations Research
Average reward reinforcement learning: foundations, algorithms, and empirical results
Machine Learning - Special issue on reinforcement learning
Model-based average reward reinforcement learning
Artificial Intelligence
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Computer Networking: A Top-Down Approach Featuring the Internet
Computer Networking: A Top-Down Approach Featuring the Internet
Learning an Agent's Utility Function by Observing Behavior
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Multi-Agent Policy-Gradient Approach to Network Routing
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multi-criteria Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Algorithms for Inverse Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
TPOT-RL Applied to Network Routing
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems
Eighteenth national conference on Artificial intelligence
Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks
PERCOM '03 Proceedings of the First IEEE International Conference on Pervasive Computing and Communications
A Geometric Approach to Multi-Criterion Reinforcement Learning
The Journal of Machine Learning Research
Apprenticeship learning via inverse reinforcement learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Flexible decomposition algorithms for weakly coupled Markov decision problems
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Learning all optimal policies with multiple criteria
Proceedings of the 25th international conference on Machine learning
Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin
Neural Information Processing
Transfer in variable-reward hierarchical reinforcement learning
Machine Learning
On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
ATC '09 Proceedings of the 6th International Conference on Autonomic and Trusted Computing
Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
A Human-Robot Collaborative Reinforcement Learning Algorithm
Journal of Intelligent and Robotic Systems
Multi-policy optimization in self-organizing systems
SOAR'09 Proceedings of the First international conference on Self-organizing architectures
An empirical comparison of two common multiobjective reinforcement learning algorithms
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Linear fitted-Q iteration with multiple reward functions
The Journal of Machine Learning Research
Online learning of timeout policies for dynamic power management
ACM Transactions on Embedded Computing Systems (TECS)
Engineering Applications of Artificial Intelligence
A survey of multi-objective sequential decision-making
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agent's preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridan's ass problem and network routing.