Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales

Authors:
R. S. Sutton;D. Precup;S. Singh
Affiliations:
-;-;-
Venue:
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
Year:
1998

Citing 0
Cited 10

Reward maximization in a non-stationary mobile robot environment

AGENTS '00 Proceedings of the fourth international conference on Autonomous agents
Structure in the Space of Value Functions

Machine Learning
Reinforcement Learning Agents

Artificial Intelligence Review
An Overview of MAXQ Hierarchical Reinforcement Learning

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
On the Use of Option Policies for Autonomous Robot Navigation

IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Reinforcement Learning: Past, Present and Future

SEAL'98 Selected papers from the Second Asia-Pacific Conference on Simulated Evolution and Learning on Simulated Evolution and Learning
Automatic generation of an agent's basic behaviors

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Bounding the suboptimality of reusing subproblems

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Generating hierarchical structure in reinforcement learning from state variables

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include {\em options\/}---whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches or joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and to plan and learn with them. In this paper we develop these connections, building on prior work by Bradtke and Duff (1995), Parr (1998) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions to improve over SMDP methods with no additional cost. We also introduce {\it intra-option\/} temporal-difference methods that are able to learn from fragments of an option''s execution. Finally, we propose a notion of subgoal which can be used to improve the options themselves. Overall, we argue that options and their models provide hitherto missing aspects of a powerful, clear, and expressive framework for representing and organizing knowledge.