Hierarchical reinforcement learning with the MAXQ value function decomposition

Authors:
Thomas G. Dietterich
Affiliations:
Department of Computer Science, Oregon State University, Corvallis, OR
Venue:
Journal of Artificial Intelligence Research
Year:
2000

Citing 25
Cited 150

Macro-operators: a weak method for learning

Artificial Intelligence - Lecture notes in computer science 178
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
O-Plan: the open planning architecture

Artificial Intelligence
Technical Note: \cal Q-Learning

Machine Learning
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
Investigating production system representations for non-combinatorial match

Artificial Intelligence
Module-Based Reinforcement Learning: Experiments with a Real Robot

Machine Learning - Special issue on learning in autonomous robots
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Improved switching among temporally abstract actions

Proceedings of the 1998 conference on Advances in neural information processing systems II
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Hierarchical Explanation-Based Reinforcement Learning

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The MAXQ Method for Hierarchical Reinforcement Learning

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Feudal Reinforcement Learning

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Dynamic Programming

Dynamic Programming
Decomposition Techniques for Planning in Stochastic Domains

Decomposition Techniques for Planning in Stochastic Domains
Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales

Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Hierarchical solution of Markov decision processes using macro-actions

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Flexible decomposition algorithms for weakly coupled Markov decision problems

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Hierarchical multi-agent reinforcement learning

Proceedings of the fifth international conference on Autonomous agents
Using background knowledge to speed reinforcement learning in physical agents

Proceedings of the fifth international conference on Autonomous agents
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

ECML '02 Proceedings of the 13th European Conference on Machine Learning
An Overview of MAXQ Hierarchical Reinforcement Learning

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Computational Models of the Amygdala and the Orbitofrontal Cortex: A Hierarchical Reinforcement Learning System for Robotic Control

AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Speeding-up Reinforcement Learning with Multi-step Actions

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Spatiotemporal Abstraction of Stochastic Sequential Processes

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
Greedy linear value-approximation for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Recent Advances in Hierarchical Reinforcement Learning

Discrete Event Dynamic Systems
ε-mdps: learning in varying environments

The Journal of Machine Learning Research
Reinforcing reachable routes

Computer Networks: The International Journal of Computer and Telecommunications Networking
Emotion-based hierarchical reinforcement learning

Design and application of hybrid intelligent systems
Transfer of Experience Between Reinforcement Learning Environments with Progressive Difficulty

Artificial Intelligence Review
Using relative novelty to identify useful temporal abstractions in reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic abstraction in reinforcement learning via clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Reinforcement Learning with Factored States and Actions

The Journal of Machine Learning Research
Learning to Communicate and Act Using Hierarchical Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Planning and programming with first-order markov decision processes: insights and challenges

TARK '01 Proceedings of the 8th conference on Theoretical aspects of rationality and knowledge
The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
A causal approach to hierarchical decomposition of factored MDPs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Motor primitive and sequence self-organization in a hierarchical recurrent neural network

Neural Networks - 2004 Special issue: New developments in self-organizing systems
An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks

Neural Computation
Relational temporal difference learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A hierarchical approach to efficient reinforcement learning in deterministic domains

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Probabilistic policy reuse in a reinforcement learning agent

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Approximate Reasoning in MAS: Rough Set Approach

IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
Learning what to talk about in descriptive games

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Dimensions of complexity of intelligent agents

PCAR '06 Proceedings of the 2006 international symposium on Practical cognitive agents and robots
Causal Graph Based Decomposition of Factored MDPs

The Journal of Machine Learning Research
Approximate Reasoning in MAS: Rough Set Approach

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Automatic shaping and decomposition of reward functions

Proceedings of the 24th international conference on Machine learning
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
A framework for meta-level control in multi-agent systems

Autonomous Agents and Multi-Agent Systems
A layered approach to learning coordination knowledge in multiagent environments

Applied Intelligence
Scaling ant colony optimization with hierarchical reinforcement learning partitioning

Proceedings of the 10th annual conference on Genetic and evolutionary computation
An object-oriented representation for efficient reinforcement learning

Proceedings of the 25th international conference on Machine learning
Hierarchical model-based reinforcement learning: R-max + MAXQ

Proceedings of the 25th international conference on Machine learning
Automatic discovery and transfer of MAXQ hierarchies

Proceedings of the 25th international conference on Machine learning
Hierarchical Average Reward Reinforcement Learning

The Journal of Machine Learning Research
The utility of temporal abstraction in reinforcement learning

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Ensemble clustering with voting active clusters

Pattern Recognition Letters
Multi-robot Cooperation Based on Hierarchical Reinforcement Learning

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Subgoal Identification for Reinforcement Learning and Planning in Multiagent Problem Solving

MATES '07 Proceedings of the 5th German conference on Multiagent System Technologies
Multigrid Reinforcement Learning with Reward Shaping

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Learning MDP Action Models Via Discrete Mixture Trees

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Transfer in variable-reward hierarchical reinforcement learning

Machine Learning
Partial Order Hierarchical Reinforcement Learning

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
Using Strongly Connected Components as a Basis for Autonomous Skill Acquisition in Reinforcement Learning

ISNN '09 Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural Networks
An Inductive Logic Programming Approach to Statistical Relational Learning

Proceedings of the 2005 conference on An Inductive Logic Programming Approach to Statistical Relational Learning
Learning by Automatic Option Discovery from Conditionally Terminating Sequences

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A hybrid approach to multi-agent decision-making

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Social and Cognitive System for Learning Negotiation Strategies with Incomplete Information

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Multiscale Anticipatory Behavior by Hierarchical Reinforcement Learning

Anticipatory Behavior in Adaptive Learning Systems
Toward Rough-Granular Computing

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
QUICR-learning for multi-agent coordination

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Controlled search over compact state representations, in nondeterministic planning domains and beyond

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Compositional Models for Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Online resource allocation using decompositional reinforcement learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Improving action selection in MDP's via knowledge transfer

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Using domain-configurable search control for probabilistic planning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Towards competence in autonomous agents

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Efficient structure learning in factored-state MDPs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Economic hierarchical Q-learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Evaluation of a hierarchical reinforcement learning spoken dialogue system

Computer Speech and Language
Refining the execution of abstract actions with learned action models

Journal of Artificial Intelligence Research
Behavior bounding: an efficient method for high-level behavior comparison

Journal of Artificial Intelligence Research
Learning and multiagent reasoning for autonomous agents

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
State similarity based approach for improving performance in RL

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Building portable options: skill transfer in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Effective control knowledge transfer through learning skill and representation hierarchies

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Generalizing plans to new environments in relational MDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Preference-Aware Web Service Composition Using Hierarchical Reinforcement Learning

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Automatic abstraction in reinforcement learning using data mining techniques

Robotics and Autonomous Systems
State abstraction discovery from irrelevant state variables

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Concurrent hierarchical reinforcement learning

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Motivated agents

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Efficient skill learning using abstraction selection

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Switching between different state representations in reinforcement learning

AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
On agents and grids: Creating the fabric for a new generation of distributed intelligent systems

Web Semantics: Science, Services and Agents on the World Wide Web
2010 Special Issue: Online learning of shaping rewards in reinforcement learning

Neural Networks
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Integration of genetic programming and reinforcement learning for real robots

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
An evolutionary approach to automatic construction of the structure in hierarchical reinforcement learning

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Learning to control at multiple time scales

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Computing and using lower and upper bounds for action elimination in MDP planning

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Active learning of dynamic Bayesian networks in Markov decision processes

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Toward perception based computing: a rough-granular perspective

WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Safe state abstraction and reusable continuing subtasks in hierarchical reinforcement learning

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
A relational hierarchical model for decision-theoretic assistance

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Relational macros for transfer in reinforcement learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
A state-cluster based Q-learning

ICNC'09 Proceedings of the 5th international conference on Natural computation
TTree: tree-based state generalization with temporally abstract actions

Adaptive agents and multi-agent systems
Probabilistic Policy Reuse for inter-task transfer learning

Robotics and Autonomous Systems
Interaction of culture-based learning and cooperative co-evolution and its application to automatic behavior-based system design

IEEE Transactions on Evolutionary Computation
Linear options

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Optimal policy switching algorithms for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Basis function construction for hierarchical reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Extending BDI plan selection to incorporate learning from experience

Robotics and Autonomous Systems
Hierarchical reinforcement learning for adaptive text generation

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Generating adaptive route instructions using hierarchical reinforcement learning

SC'10 Proceedings of the 7th international conference on Spatial cognition
Combining reinforcement learning with symbolic planning

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Autonomous discovery of subgoals using acyclic state trajectories

ICICA'10 Proceedings of the First international conference on Information computing and applications
Hybrid credit ranking intelligent system using expert system and artificial neural networks

Applied Intelligence
Spatially-aware dialogue control using hierarchical reinforcement learning

ACM Transactions on Speech and Language Processing (TSLP)
Hierarchical reinforcement learning and hidden Markov models for task-oriented natural language generation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Multi-cellular-ant algorithm for large scale capacity vehicle route problem

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part I
Episodic task learning in Markov decision processes

Artificial Intelligence Review
SD-Q: selective discount Q learning based on new results of intertemporal choice theory

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
Distributed planning in hierarchical factored MDPs

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Reinforcement learning with partially known world dynamics

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Patching approximate solutions in reinforcement learning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Policy-contingent abstraction for robust robot control

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Structural abstraction experiments in reinforcement learning

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Coordination with collective and individual decisions

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
Grey reinforcement learning for incomplete information processing

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
Optimising natural language generation decision making for situated dialogue

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Learning skills in reinforcement learning using relative novelty

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Guiding inference through relational reinforcement learning

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
Rough sets and vague concept approximation: from sample approximation to adaptive learning

Transactions on Rough Sets V
Recursive adaptation of stepsize parameter for non-stationary environments

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Effectiveness of considering state similarity for reinforcement learning

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
The eMOSAIC model for humanoid robot control

Neural Networks
Q-error as a selection mechanism in modular reinforcement-learning systems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Soar-RL: integrating reinforcement learning with Soar

Cognitive Systems Research
Automatic construction of temporally extended actions for MDPs using bisimulation metrics

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Options with exceptions

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
An extension of a hierarchical reinforcement learning algorithm for multiagent settings

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Coordination guided reinforcement learning

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Strong mitigation: nesting search for good policies within search for good reward

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Transfer in reinforcement learning via shared features

The Journal of Machine Learning Research
Online planning for large MDPs with MAXQ decomposition

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Integrating relational reinforcement learning with reasoning about actions and change

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
A modular hierarchical reinforcement learning algorithm

ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Proximity-based non-uniform abstractions for approximate planning

Journal of Artificial Intelligence Research
Optimising incremental dialogue decisions using information density for interactive systems

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Neuro-fuzzy-based skill learning for robots

Robotica
Reinforcement Learning with Reward Shaping and Mixed Resolution Function Approximation

International Journal of Agent Technologies and Systems
A hierarchical representation policy iteration algorithm for reinforcement learning

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Machine learning for interactive systems and robots: a brief introduction

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Robotic Urban Search and Rescue: A Survey from the Control Perspective

Journal of Intelligent and Robotic Systems
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Automatic skill acquisition in reinforcement learning using graph centrality measures

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics--as a subroutine hierarchy--and a declarative semantics--as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consistent with the given hierarchy. The decomposition also creates opportunities to exploit state abstractions, so that individual MDPs within the hierarchy can ignore large parts of the state space. This is important for the practical application of the method. This paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q through a series of experiments in three domains and shows experimentally that MAXQ-Q (with state abstractions) converges to a recursively optimal policy much faster than flat Q learning. The fact that MAXQ learns a representation of the value function has an important benefit: it makes it possible to compute and execute an improved, non-hierarchical policy via a procedure similar to the policy improvement step of policy iteration. The paper demonstrates the effectiveness of this nonhierarchical execution experimentally. Finally, the paper concludes with a comparison to related work and a discussion of the design tradeoffs in hierarchical reinforcement learning.