Learning the Difference between Partially Observable Dynamical Systems

Authors:
Sami Zhioua;Doina Precup;François Laviolette;Josée Desharnais
Affiliations:
School of Computer Science, McGill University, Canada;School of Computer Science, McGill University, Canada;Department of Computer Science and Software Engineering, Laval University, Canada;Department of Computer Science and Software Engineering, Laval University, Canada
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Year:
2009

Citing 9
Cited 1

Bisimulation through probabilistic testing

Information and Computation
Elements of information theory

Elements of information theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
Metrics for labelled Markov processes

Theoretical Computer Science - Logic, semantics and theory of programming
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Testing probabilistic equivalence through reinforcement learning

FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Online testing with reinforcement learning

FATES'06/RV'06 Proceedings of the First combined international conference on Formal Approaches to Software Testing and Runtime Verification

Testing probabilistic equivalence through Reinforcement Learning

Information and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach for estimating the difference between two partially observable dynamical systems. We assume that one can interact with the systems by performing actions and receiving observations. The key idea is to define a Markov Decision Process (MDP) based on the systems to be compared, in such a way that the optimal value of the MDP initial state can be interpreted as a divergence (or dissimilarity) between the systems. This dissimilarity can then be estimated by reinforcement learning methods. Moreover, the optimal policy will contain information about the actions which most distinguish the systems. Empirical results show that this approach is useful in detecting both big and small differences, as well as in comparing systems with different internal structure.