Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

Authors:
Tomas Brázdil;Václav Brozek;Krishnendu Chatterjee;Vojtech Forejt;Antonin Kucera
Affiliations:
-;-;-;-;-
Venue:
LICS '11 Proceedings of the 2011 IEEE 26th Annual Symposium on Logic in Computer Science
Year:
2011

Citing 0
Cited 5

Playing stochastic games precisely

CONCUR'12 Proceedings of the 23rd international conference on Concurrency Theory
Pareto curves for probabilistic model checking

ATVA'12 Proceedings of the 10th international conference on Automated Technology for Verification and Analysis
Faster algorithms for markov decision processes with low treewidth

CAV'13 Proceedings of the 25th international conference on Computer Aided Verification
Synthesis for multi-objective stochastic games: an application to autonomous urban driving

QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Trading Performance for Stability in Markov Decision Processes

LICS '13 Proceedings of the 2013 28th Annual ACM/IEEE Symposium on Logic in Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k reward functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the single-objective case, both randomization and memory are necessary for strategies, and that finite-memory randomized strategies are sufficient. Under the satisfaction objective, in contrast to the single-objective case, infinite memory is necessary for strategies, and that randomized memoryless strategies are sufficient for epsilon-approximation, for all epsilon0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of reward functions, for all epsilon0. Our results also reveal flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, correct the flaws and obtain improved results.