Learning to act using real-time dynamic programming
Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
How to dynamically merge Markov decision processes
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
LAO: a heuristic search algorithm that finds solutions with loops
Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees
ICML '05 Proceedings of the 22nd international conference on Machine learning
An iterative algorithm for solving constrained decentralized Markov decision processes
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Faster heuristic search algorithms for planning with uncertainty and full feedback
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Hi-index | 0.00 |
This paper contributes to solve effectively stochastic resource allocation problems known to be NP-Complete. To address this complex resource management problem, a Q-decomposition approach is proposed when the resources which are already shared among the agents, but the actions made by an agent may influence the reward obtained by at least another agent. The Q-decomposition allows to coordinate these reward separated agents and thus permits to reduce the set of states and actions to consider. On the other hand, when the resources are available to all agents, no Q-decomposition is possible and we use heuristic search. In particular, the bounded Real-time Dynamic Programming (bounded RTDP) is used. Bounded RTDP concentrates the planning on significant states only and prunes the action space. The pruning is accomplished by proposing tight upper and lower bounds on the value function.