Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Reinforcement learning with hierarchies of machines
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Multi-time models for temporally abstract planning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
The MAXQ Method for Hierarchical Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Advances in Neural Information Processing Systems 5, [NIPS Conference]
Hi-index | 0.00 |
If you have planned to achieve one particular goal in a stochastic delayed rewards problem and then someone asks about a different goal what should you do? What if you need to be ready to quickly supply an answer for any possible goal? This paper shows that by using a new kind of automata caily generated abstract action hierarchy that with N states, preparing for all of N possible goals can be much much cheaper than N times the work of preparing for one goal. In goal-based Markov Decision Problems, it is usual to generate a policy π(x) mapping states to actions, and a value function J(x) mapping states to an estimate of minimum expected cost-to-goal, starting at x. In this paper we will use the terminology that a multipolicy π*(x, y) (for all state-pairs (x, y)) maps a state x to the first action it should take in order to reach y with expected minimum cost and a multi-valuefunction J*(x, y) is a definition of this minimum cost. Building these objects quickly and with little memory is the main purpose of this paper, but a secondary result is a natural, automatic, way to create a set of parsimonious yet powerful abstractactions for MDPs. The paper concludes with a set of empirical results on increasingly large MDPs.