Causal Graph Based Decomposition of Factored MDPs

Authors:
Anders Jonsson;Andrew Barto
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2006

Citing 28
Cited 8

Statecharts: A visual formalism for complex systems

Science of Computer Programming
A model for reasoning about persistence and causation

Computational Intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Reinforcement learning with hierarchies of machines

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Discovering Hierarchy in Reinforcement Learning with HEXQ

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Continuous-Time Hierarchical Reinforcement Learning

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Efficient Reinforcement Learning in Factored MDPs

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Symbolic heuristic search for factored Markov decision processes

Eighteenth national conference on Artificial intelligence
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes
Using relative novelty to identify useful temporal abstractions in reinforcement learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Dynamic abstraction in reinforcement learning via clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
An algebraic approach to abstraction in reinforcement learning

An algebraic approach to abstraction in reinforcement learning
A causal approach to hierarchical decomposition of factored MDPs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Identifying useful subgoals in reinforcement learning by local graph partitioning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Decomposition techniques for planning in stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Unsupervised active learning in large domains

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Hierarchical solution of Markov decision processes using macro-actions

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Symbolic generalization for on-line planning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Automatic discovery and transfer of MAXQ hierarchies

Proceedings of the 25th international conference on Machine learning
Learning MDP Action Models Via Discrete Mixture Trees

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Discovering options from example trajectories

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Autonomously learning an action hierarchy using a learned qualitative state representation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Active learning of dynamic Bayesian networks in Markov decision processes

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Optimal policy switching algorithms for reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
TeXDYNA: hierarchical reinforcement learning in factored MDPs

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Automatic construction of temporally extended actions for MDPs using bisimulation metrics

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present Variable Influence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks with sparse causal graphs VISA exploits structure by introducing activities that cause the values of state variables to change. The result is a hierarchy of activities that together represent a solution to the original task. VISA performs state abstraction for each activity by ignoring irrelevant state variables and lower-level activities. In addition, we describe an algorithm for constructing compact models of the activities introduced. State abstraction and compact activity models enable VISA to apply efficient algorithms to solve the stand-alone subtask associated with each activity. Experimental results show that the decomposition introduced by VISA can significantly accelerate construction of an optimal, or near-optimal, policy.