Equivalence notions and model minimization in Markov decision processes

Authors:
Robert Givan;Thomas Dean;Matthew Greig
Affiliations:
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN;Department of Computer Science, Brown University, Providence, RI;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
Venue:
Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Year:
2003

Citing 30
Cited 44

Algebraic laws for nondeterminism and concurrency

Journal of the ACM (JACM)
Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A model for reasoning about persistence and causation

Computational Intelligence
Operational and algebraic semantics of concurrent processes

Handbook of theoretical computer science (vol. B)
Bisimulation through probabilistic testing

Information and Computation
Online minimization of transition systems (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Minimal state graph generation

Science of Computer Programming
Modeling a dynamic and uncertain world I: symbolic and probabilistic reasoning about change

Artificial Intelligence
Using abstractions for decision-theoretic planning with time constraints

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
An algorithm for probabilistic planning

Artificial Intelligence - Special volume on planning and scheduling
Abstraction and approximate decision-theoretic planning

Artificial Intelligence
Stochastic dynamic programming with factored representations

Artificial Intelligence
Communication and Concurrency

Communication and Concurrency
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Dynamic Non-uniform Abstractions for Approximate Planning in Large Structured Stochastic Domains

PRICAI '98 Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence: Topics in Artificial Intelligence
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Concurrency and Automata on Infinite Sequences

Proceedings of the 5th GI-Conference on Theoretical Computer Science
Dynamic Programming

Dynamic Programming
The Witness Algorithm: Solving Partially Observable Markov Decision Processes

The Witness Algorithm: Solving Partially Observable Markov Decision Processes
Projecting plans for uncertain worlds

Projecting plans for uncertain worlds
Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics)

Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics)
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Probabilistic propositional planning: representations and complexity

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Model Minimization in Hierarchical Reinforcement Learning

Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Coordinating Multiple Agents via Reinforcement Learning

Autonomous Agents and Multi-Agent Systems
Exact solutions of interactive POMDPs using behavioral equivalence

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Exact finite approximations of average-cost countable Markov decision processes

Automatica (Journal of IFAC)
On the hardness of finding symmetries in Markov decision processes

Proceedings of the 25th international conference on Machine learning
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Transfer via soft homomorphisms

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
An Inductive Logic Programming Approach to Statistical Relational Learning

Proceedings of the 2005 conference on An Inductive Logic Programming Approach to Statistical Relational Learning
Metrics for finite Markov decision processes

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Decision tree methods for finding reusable MDP homomorphisms

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
Closed-loop learning of visual control policies

Journal of Artificial Intelligence Research
State similarity based approach for improving performance in RL

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Deictic option schemas

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Efficiently exploiting symmetries in real time dynamic programming

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Efficiently exploiting symmetries in real time dynamic programming

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
SMDP homomorphisms: an algebraic approach to abstraction in semi-Markov decision processes

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Intensional dynamic programming. A Rosetta stone for structured dynamic programming

Journal of Algorithms
Equivalence relations in fully and partially observable Markov decision processes

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Generating Explanations Based on Markov Decision Processes

MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Scalable approach for effective control of gene regulatory networks

Artificial Intelligence in Medicine
Computing and using lower and upper bounds for action elimination in MDP planning

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Incremental plan aggregation for generating policies in MDPs

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Learning from demonstration using MDP induced metrics

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Probabilistic Logical Characterization

Information and Computation
Model refinement using bisimulation quotients

AMAST'10 Proceedings of the 13th international conference on Algebraic methodology and software technology
Using Markov Decision Processes to define an adaptive strategy to control the spread of an animal disease

Computers and Electronics in Agriculture
Using rewards for belief state updates in partially observable markov decision processes

ECML'05 Proceedings of the 16th European conference on Machine Learning
Exploiting symmetries for single- and multi-agent Partially Observable Stochastic Domains

Artificial Intelligence
Data-driven dynamic emulation modelling for the optimal management of environmental systems

Environmental Modelling & Software
Lossy stochastic game abstraction with bounds

Proceedings of the 13th ACM Conference on Electronic Commerce
Bisimulation Metrics for Continuous Markov Decision Processes

SIAM Journal on Computing
Feature reinforcement learning in practice

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Automatic construction of temporally extended actions for MDPs using bisimulation metrics

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Exploiting model equivalences for solving interactive dynamic influence diagrams

Journal of Artificial Intelligence Research
Bisimulation and logical preservation for continuous-time markov decision processes

CONCUR'07 Proceedings of the 18th international conference on Concurrency Theory
A uniform framework for modeling nondeterministic, probabilistic, stochastic, or mixed processes and their behavioral equivalences

Information and Computation
Model selection in markovian processes

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental clustering and expansion for faster optimal planning in decentralized POMDPs

Journal of Artificial Intelligence Research
Cost preserving bisimulations for probabilistic automata

CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory
The bisimdist library: efficient computation of bisimilarity distances for markovian models

QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Approximation Metrics Based on Probabilistic Bisimulations for General State-Space Markov Processes: A Survey

Electronic Notes in Theoretical Computer Science (ENTCS)
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy

Pervasive and Mobile Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for solving them run in time polynomial in the size of the state space, where this size is extremely large for most real-world planning problems of interest. Recent AI research has addressed this problem by representing the MDP in a factored form. Factored MDPs, however, are not amenable to traditional solution methods that call for an explicit enumeration of the state space. One familiar way to solve MDP problems with very large state spaces is to form a reduced (or aggregated) MDP with the same properties as the original MDP by combining "equivalent" states. In this paper, we discuss applying this approach to solving factored MDP problems--we avoid enumerating the state space by describing large blocks of "equivalent" states in factored form, with the block descriptions being inferred directly from the original factored representation. The resulting reduced MDP may have exponentially fewer states than the original factored MDP, and can then be solved using traditional methods. The reduced MDP found depends on the notion of equivalence between states used in the aggregation. The notion of equivalence chosen will be fundamental in designing and analyzing algorithms for reducing MDPs. Optimally, these algorithms will be able to find the smallest possible reduced MDP for any given input MDP and notion of equivalence (i.e., find the "minimal model" for the input MDP). Unfortunately, the classic notion of state equivalence from non-deterministic finite state machines generalized to MDPs does not prove useful. We present here a notion of equivalence that is based upon the notion of bisimulation from the literature on concurrent processes. Our generalization of bisimulation to stochastic processes yields a non-trivial notion of state equivalence that guarantees the optimal policy for the reduced model immediately induces a corresponding Optimal policy for the original model. With this notion of state equivalence, we design and analyze an algorithm that minimizes arbitrary factored MDPs and compare this method analytically to previous algorithms for solving factored MDPs. We show that previous approaches implicitly derive equivalence relations that we define here.