Multi-value-functions: efficient automatic action hierarchies for multiple goal MDPs

Authors:
Andrew W. Moore;Leemon C. Baird;Leslie Kaelbling
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Brown University
Venue:
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Year:
1999

Citing 6
Cited 0

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Multi-time models for temporally abstract planning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The MAXQ Method for Hierarchical Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]

Quantified Score

Hi-index	0.00

Visualization

Abstract

If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automata caily generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy π(x) mapping states to actions, and a value function J(x) mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multipolicy π*(x, y) (for all state-pairs (x, y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J*(x, y) is a definition of this minimum cost. Building these objects quickly and with little memory is the main purpose of this paper, but a secondary result is a natural, automatic, way to create a set of parsimonious yet powerful abstractactions for MDPs. The paper concludes with a set of empirical results on increasingly large MDPs.