A faster strongly polynomial minimum cost flow algorithm
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Bisimulation through probabilistic testing
Information and Computation
The formal semantics of programming languages: an introduction
The formal semantics of programming languages: an introduction
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
A Calculus of Communicating Systems
A Calculus of Communicating Systems
Performance measure sensitive congruences for Markovian process algebras
Theoretical Computer Science
Diffusion Kernels on Graphs and Other Discrete Input Spaces
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
The Metric Analogue of Weak Bisimulation for Probabilistic Processes
LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Concurrency and Automata on Infinite Sequences
Proceedings of the 5th GI-Conference on Theoretical Computer Science
Equivalence notions and model minimization in Markov decision processes
Artificial Intelligence - special issue on planning with uncertainty and incomplete information
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Transfer of samples in batch reinforcement learning
Proceedings of the 25th international conference on Machine learning
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Improving Batch Reinforcement Learning Performance through Transfer of Samples
Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Solving factored MDPs with hybrid state and action variables
Journal of Artificial Intelligence Research
The Kantorovich Metric in Computer Science: A Brief Survey
Electronic Notes in Theoretical Computer Science (ENTCS)
Equivalence relations in fully and partially observable Markov decision processes
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Using bisimulation for policy transfer in MDPs
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Smarter sampling in model-based Bayesian reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Learning from demonstration using MDP induced metrics
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Basis function discovery using spectral clustering and bisimulation metrics
ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Bisimulation Metrics for Continuous Markov Decision Processes
SIAM Journal on Computing
Automatic construction of temporally extended actions for MDPs using bisimulation metrics
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A pseudometric in supervisory control of probabilistic discrete event systems
Discrete Event Dynamic Systems
On-the-Fly exact computation of bisimilarity distances
TACAS'13 Proceedings of the 19th international conference on Tools and Algorithms for the Construction and Analysis of Systems
The bisimdist library: efficient computation of bisimilarity distances for markovian models
QEST'13 Proceedings of the 10th international conference on Quantitative Evaluation of Systems
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.00 |
We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.