On minimizing ordered weighted regrets in multiobjective Markov decision processes

Authors:
Wlodzimierz Ogryczak;Patrice Perny;Paul Weng
Affiliations:
ICCE, Warsaw University of Technology, Warsaw, Poland;LIP6 - UPMC, Paris, France;LIP6 - UPMC, Paris, France
Venue:
ADT'11 Proceedings of the Second international conference on Algorithmic decision theory
Year:
2011

Citing 5
Cited 2

On ordered weighted averaging aggregation operators in multicriteria decisionmaking

IEEE Transactions on Systems, Man and Cybernetics
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Sequential optimality and coordination in multiagent systems

IJCAI'99 Proceedings of the 16th international joint conference on Artifical intelligence - Volume 1
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Markov decision processes with multiple objectives

STACS'06 Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science

On WOWA rank reversal

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectives, the regret being defined on each dimension as the opportunity loss with respect to optimal expected rewards. To this end, we propose to minimize the ordered weighted average of regrets (OWR). The OWR criterion indeed extends the minimax regret, relaxing egalitarianism for a milder notion of fairness. After showing that OWR-optimality is state-dependent and that the Bellman principle does not hold for OWR-optimal policies, we propose a linear programming reformulation of the problem. We also provide experimental results showing the efficiency of our approach.