Bisimulation Metrics for Continuous Markov Decision Processes

Authors:
Norm Ferns;Prakash Panangaden;Doina Precup
Affiliations:
norm.ferns@normferns.com;prakash@sc.mcgill.ca and dprecup@cs.mcgill.ca;-
Venue:
SIAM Journal on Computing
Year:
2011

Citing 39
Cited 0

Algebraic laws for nondeterminism and concurrency

Journal of the ACM (JACM)
A faster strongly polynomial minimum cost flow algorithm

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Bisimulation through probabilistic testing

Information and Computation
The formal semantics of programming languages: an introduction

The formal semantics of programming languages: an introduction
Scale-sensitive dimensions, uniform convergence, and learnability

Journal of the ACM (JACM)
How does the value function of a Markov decision process depend on the transition probabilities?

Mathematics of Operations Research
On dual minimum cost flow algorithms (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Stochastic dynamic programming with factored representations

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Communication and Concurrency

Communication and Concurrency
A Calculus of Communicating Systems

A Calculus of Communicating Systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Performance measure sensitive congruences for Markovian process algebras

Theoretical Computer Science
The Metric Analogue of Weak Bisimulation for Probabilistic Processes

LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Towards Quantitative Verification of Probabilistic Transition Systems

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Metrics for Labeled Markov Systems

CONCUR '99 Proceedings of the 10th International Conference on Concurrency Theory
An Algorithm for Quantitative Verification of Probabilistic Transition Systems

CONCUR '01 Proceedings of the 12th International Conference on Concurrency Theory
Concurrency and Automata on Infinite Sequences

Proceedings of the 5th GI-Conference on Theoretical Computer Science
Bisimulation for labelled Markov processes

Information and Computation - Special issue: LICS'97
Bisimulation for Labelled Markov Processes

LICS '97 Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science
A probabilistic PDL

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Labelled markov processes

Labelled markov processes
When Scott is weak on the top

Mathematical Structures in Computer Science
Coffee, Tea, or ...?: A Markov Decision Process Model for Airline Meal Provisioning

Transportation Science
Metrics for labelled Markov processes

Theoretical Computer Science - Logic, semantics and theory of programming
Metrics for finite Markov decision processes

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
An approximation algorithm for labelled Markov processes: towards realistic approximation

QEST '05 Proceedings of the Second International Conference on the Quantitative Evaluation of Systems
A Computational Study of Cost Reoptimization for Min-Cost Flow Problems

INFORMS Journal on Computing
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Labelled Markov Processes

Labelled Markov Processes
State-similarity metrics for continuous markov decision processes

State-similarity metrics for continuous markov decision processes
Approximating a behavioural pseudometric without discount for probabilistic systems

FOSSACS'07 Proceedings of the 10th international conference on Foundations of software science and computational structures
Flow faster: efficient decision algorithms for probabilistic simulations

TACAS'07 Proceedings of the 13th international conference on Tools and algorithms for the construction and analysis of systems
Simulation hemi-metrics between infinite-state stochastic games

FOSSACS'08/ETAPS'08 Proceedings of the Theory and practice of software, 11th international conference on Foundations of software science and computational structures
Model minimization in Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Model reduction techniques for computing approximately optimal solutions for Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, various metrics have been developed for measuring the behavioral similarity of states in probabilistic transition systems [J. Desharnais et al., Proceedings of CONCUR'99, Springer-Verlag, London, 1999, pp. 258-273; F. van Breugel and J. Worrell, Proceedings of ICALP'01, Springer-Verlag, London, 2001, pp. 421-432]. In the context of finite Markov decision processes (MDPs), we have built on these metrics to provide a robust quantitative analogue of stochastic bisimulation [N. Ferns, P. Panangaden, and D. Precup, Proceedings of UAI-04, AUAI Press, Arlington, VA, 2004, pp. 162-169] and an efficient algorithm for its calculation [N. Ferns, P. Panangaden, and D. Precup, Proceedings of UAI-06, AUAI Press, Arlington, VA, 2006, pp. 174-181]. In this paper, we seek to properly extend these bisimulation metrics to MDPs with continuous state spaces. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in formally guiding real-world planning, where tasks are usually continuous and highly stochastic in nature, e.g., robot navigation, and often a substitution with a parametric model or crude finite approximation must be made. We show that the optimal value function associated with a discounted infinite-horizon planning task is continuous with respect to metric distances. Thus, our metrics allow one to reason about the quality of solution obtained by replacing one model with another. Alternatively, they may potentially be used directly for state aggregation.