A Q-decomposition and bounded RTDP approach to resource allocation

Authors:
Pierrick Plamondon;Brahim Chaib-draa;Abder Rezak Benaskeur
Affiliations:
Laval University, Québec, Canada;Laval University, Québec, Canada;Defence R&D Canada --- Valcartier, Quéébec, Canada
Venue:
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Year:
2007

Citing 7
Cited 0

Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
How to dynamically merge Markov decision processes

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
An iterative algorithm for solving constrained decentralized Markov decision processes

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Faster heuristic search algorithms for planning with uncertainty and full feedback

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, a Q-decomposition approach is proposed when the resources which are already shared among the agents, but the actions made by an agent may influence the reward obtained by at least another agent. The Q-decomposition allows to coordinate these reward separated agents and thus permits to reduce the set of states and actions to consider. On the other hand, when the resources are available to all agents, no Q-decomposition is possible and we use heuristic search. In particular, the bounded Real-time Dynamic Programming (bounded RTDP) is used. Bounded RTDP concentrates the planning on significant states only and prunes the action space. The pruning is accomplished by proposing tight upper and lower bounds on the value function.