Exploiting structure in policy construction

Authors:
Craig Boutilier;Richard Dearden;Moises Goldszmidt
Affiliations:
Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Rockwell Science Center, Palo Alto, CA
Venue:
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Year:
1995

Citing 11
Cited 96

A model for reasoning about persistence and causation

Computational Intelligence
Planning and control

Planning and control
Structuring conditional relationships in influence diagrams

Operations Research
Probabilistic Horn abduction and Bayesian networks

Artificial Intelligence
Using abstractions for decision-theoretic planning with time constraints

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
An algorithm for probabilistic least-commitment planning

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Control strategies for a stochastic planner

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning Decision Lists

Machine Learning
Exploiting Structure in Policy Construction

Exploiting Structure in Policy Construction

Solving very large weakly coupled Markov decision processes

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Planning and Control in Artificial Intelligence: A Unifying Perspective

Applied Intelligence
An Integrated Approach of Learning, Planning, and Execution

Journal of Intelligent and Robotic Systems
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Improving the Efficiency of Reasoning Through Structure-Based Reformulation

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Model Minimization in Hierarchical Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Value Iteration over Belief Subspace

ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
VQQL. Applying Vector Quantization to Reinforcement Learning

RoboCup-99: Robot Soccer World Cup III
Towards Stochastic Constraint Programming: A Study of Online Multi-choice Knapsack with Deadlines

CP '01 Proceedings of the 7th International Conference on Principles and Practice of Constraint Programming
Logic, Knowledge Representation, and Bayesian Decision Theory

CL '00 Proceedings of the First International Conference on Computational Logic
State abstraction for programmable reinforcement learning agents

Eighteenth national conference on Artificial intelligence
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Contingent planning under uncertainty via stochastic satisfiability

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Solving factored MDPs using non-homogeneous partitions

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Semantic email

Proceedings of the 13th international conference on World Wide Web
Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Solving factored MDPs with continuous and discrete variables

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Computationally-efficient combinatorial auctions for resource allocation in weakly-coupled MDPs

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
A causal approach to hierarchical decomposition of factored MDPs

ICML '05 Proceedings of the 22nd international conference on Machine learning
Heterogeneous temporal probabilistic agents

ACM Transactions on Computational Logic (TOCL)
Learning the structure of Factored Markov Decision Processes in reinforcement learning problems

ICML '06 Proceedings of the 23rd international conference on Machine learning
Probabilistic inference for solving discrete and continuous state Markov Decision Processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
Resource allocation among agents with preferences induced by factored MDPs

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Causal Graph Based Decomposition of Factored MDPs

The Journal of Machine Learning Research
Hierarchical model-based reinforcement learning: R-max + MAXQ

Proceedings of the 25th international conference on Machine learning
Learning MDP Action Models Via Discrete Mixture Trees

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Exploiting Additive Structure in Factored MDPs for Reinforcement Learning

Recent Advances in Reinforcement Learning
Factored value iteration converges

Acta Cybernetica
Factored temporal difference learning in the new ties environment

Acta Cybernetica
Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
TEMMAS: The Electricity Market Multi-Agent Simulator

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Anticipatory Learning Classifier Systems and Factored Reinforcement Learning

Anticipatory Behavior in Adaptive Learning Systems
Learning basis functions in hybrid domains

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Considering Unseen States as Impossible in Factored Reinforcement Learning

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Towards exploiting duality in approximate linear programming for MDPs

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Prioritized goal decomposition of Markov decision processes: toward a synthesis of classical and decision theoretic planning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Model minimization, regression, and propositional STRIPS planning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Probabilistic partial evaluation: exploiting rule structure in probabilistic inference

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Exploiting contextual independence in probabilistic inference

Journal of Artificial Intelligence Research
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Decentralized control of cooperative systems: categorization and complexity analysis

Journal of Artificial Intelligence Research
The first probabilistic track of the international planning competition

Journal of Artificial Intelligence Research
Solving factored MDPs with hybrid state and action variables

Journal of Artificial Intelligence Research
Resource allocation among agents with MDP-induced preferences

Journal of Artificial Intelligence Research
First order decision diagrams for relational MDPs

Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains

Journal of Artificial Intelligence Research
The computational complexity of probabilistic planning

Journal of Artificial Intelligence Research
Computing near optimal strategies for stochastic investment planning problems

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving factored MDPs via non-homogeneous partitioning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
An MCMC approach to solving hybrid factored MDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Process-oriented planning and average-reward optimality

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Decomposition techniques for planning in stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Autonomously learning an action hierarchy using a learned qualitative state representation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Semantic email: theory and applications

Web Semantics: Science, Services and Agents on the World Wide Web
Active learning of dynamic Bayesian networks in Markov decision processes

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
An overview of planning under uncertainty

Artificial intelligence today
Knowledge representation for stochastic decision processes

Artificial intelligence today
Automated large-scale control of gene regulatory networks

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Rewarding behaviors

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Structured solution methods for non-Markovian decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Probabilistic propositional planning: representations and complexity

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
TeXDYNA: hierarchical reinforcement learning in factored MDPs

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Total-order multi-agent task-network planning for contract bridge

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Episodic task learning in Markov decision processes

Artificial Intelligence Review
The Benefit of Decomposing POMDP for Control of Gene Regulatory Networks

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Approximately optimal monitoring of plan preconditions

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Policy iteration for factored MDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
A clustering approach to solving large stochastic matching problems

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Decision-theoretic planning with concurrent temporally extended actions

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Structured reachability analysis for Markov decision processes

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Hierarchical solution of Markov decision processes using macro-actions

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Exploiting the rule structure for decision making within the independent choice logic

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Correlated action effects in decision theoretic regression

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Structured arc reversal and simulation of dynamic probabilistic networks

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
The complexity of plan existence and evaluation in robabilistic domains

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Region-based approximations for planning in stochastic domains

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Fast value iteration for goal-directed Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Learning conventions in multiagent stochastic domains using likelihood estimates

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Context-specific independence in Bayesian networks

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
A framework for decision-theoretic planning I: combining the situation calculus, conditional plans, probability and utility

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Implementation and comparison of solution methods for decision processes with non-markovian rewards

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning by knowledge sharing in autonomous intelligent systems

IBERAMIA-SBIA'06 Proceedings of the 2nd international joint conference, and Proceedings of the 10th Ibero-American Conference on AI 18th Brazilian conference on Advances in Artificial Intelligence
Feature extraction for decision-theoretic planning in partially observable environments

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Monitoring the execution of partial-order plans via regression

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Bisimulation Metrics for Continuous Markov Decision Processes

SIAM Journal on Computing
Proximity-based non-uniform abstractions for approximate planning

Journal of Artificial Intelligence Research
QoE-aware optimization of multimedia flow scheduling

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov decision processes (MDPs) have recently been applied to the problem of modeling decision-theoretic planning. While traditional methods for solving MDPs are often practical for small states spaces, their effectiveness for large AI planning problems is questionable. We present an algorithm, called structured policy Iteration (SPI), that constructs optimal policies without explicit enumeration of the state space. The algorithm retains the fundamental computational steps of the commonly used modified policy iteration algorithm, but exploits the variable and prepositional independencies reflected in a temporal Bayesian network representation of MDPs. The principles behind SPI can be applied to any structured representation of stochastic actions, policies and value functions, and the algorithm itself can be used in conjunction with recent approximation methods.