Model reduction techniques for computing approximately optimal solutions for Markov decision processes

Authors:
Thomas Dean;Robert Givan;Sonia Leach
Affiliations:
Department of Computer Science, Brown University;Department of Computer Science, Brown University;Department of Computer Science, Brown University
Venue:
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Year:
1997

Citing 12
Cited 19

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
A model for reasoning about persistence and causation

Computational Intelligence
Online minimization of transition systems (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Minimal state graph generation

Science of Computer Programming
Using abstractions for decision-theoretic planning with time constraints

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Planning under time constraints in stochastic domains

Artificial Intelligence - Special volume on planning and scheduling
An algorithm for probabilistic planning

Artificial Intelligence - Special volume on planning and scheduling
Aggregation Methods for Large Markov Chains

Proceedings of the International Workshop on Computer Performance and Reliability
Bounded Parameter Markov Decision Processes

Bounded Parameter Markov Decision Processes
Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics)

Algebraic structure theory of sequential machines (Prentice-Hall international series in applied mathematics)
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Optimal resource allocation in multi-class networks with user-specified utility functions

Computer Networks: The International Journal of Computer and Telecommunications Networking
Piecewise linear value function approximation for factored MDPs

Eighteenth national conference on Artificial intelligence
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Solving factored MDPs using non-homogeneous partitions

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
On the relationship between MDPs and the BDI architecture

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Model minimization, regression, and propositional STRIPS planning

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Approximate policy iteration with a policy language bias: solving relational Markov decision processes

Journal of Artificial Intelligence Research
A model approximation scheme for planning in partially observable stochastic domains

Journal of Artificial Intelligence Research
Computing factored value functions for policies in structured MDPs

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Effective control knowledge transfer through learning skill and representation hierarchies

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Domain-independent, automatic partitioning for probabilistic planning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Computing and using lower and upper bounds for action elimination in MDP planning

SARA'07 Proceedings of the 7th International conference on Abstraction, reformulation, and approximation
Fast value iteration for goal-directed Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
On reduction criteria for probabilistic reward models

FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Policy-contingent abstraction for robust robot control

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Probabilistic verification of uncertain systems using bounded-parameter markov decision processes

MDAI'06 Proceedings of the Third international conference on Modeling Decisions for Artificial Intelligence
Bisimulation Metrics for Continuous Markov Decision Processes

SIAM Journal on Computing
Q-Tree: automatic construction of hierarchical state representation for reinforcement learning

ICIRA'12 Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces. We introduce a property of state space partitions which we call ε-homogeneity. Intuitively, an ε-homogeneous partition groups together states that behave approximately the same under all or some subset of policies. Borrowing from recent work on model minimization in computer-aided software verification, we present an algorithm that takes a factored representation of an MDP and an 0 ≤ ε ≤ 1 and computes a factored ε-homogeneous partition of the state space. This partition defines a family of related MDPs--those MDP's with state space equal to the blocks of the partition, and transition probabilities "approximately" like those of any (original MDP) state in the source block. To formally study such families of MDPs, we introduce the new notion of a "bounded parameter MDP" (BMDP), which is a family of (traditional) MDPs defined by specifying upper and lower bounds on the transition probabilities and rewards. We describe algorithms that operate on BMDPs to find policies that are approximately optimal with respect to the original MDP. In combination, our method for reducing a large implicit MDP to a possibly much smaller BMDP using an ε-homogeneous partition, and our methods for selecting actions in BMDP's constitute a new approach for analyzing large implicit MDP's. Among its advantages, this new approach provides insight into existing algorithms to solving implicit MDPs, provides useful connections to work in automata theory and model minimization, and suggests methods, which involve varying ε, to trade time and space (specifically in terms of the size of the corresponding state space) for solution quality.