Towards Feasible PAC-Learning of Probabilistic Deterministic Finite Automata

Authors:
Jorge Castro;Ricard Gavaldà
Affiliations:
Departament de Llenguatges i Sistemes Informàtics LARCA Research Group, Universitat Politècnica de Catalunya, Barcelona;Departament de Llenguatges i Sistemes Informàtics LARCA Research Group, Universitat Politècnica de Catalunya, Barcelona
Venue:
ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
Year:
2008

Citing 12
Cited 4

On the learnability of discrete distributions

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
On the learnability and usage of acyclic probabilistic finite automata

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Stochastic Grammatical Inference with Multinomial Tests

ICGI '02 Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications
Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Identification of DFA: data-dependent vs data-independent algorithms

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
PAC-learnability of Probabilistic Deterministic Finite State Automata

The Journal of Machine Learning Research
Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms

Pattern Recognition
PAC-Learning of markov models with hidden state

ECML'06 Proceedings of the 17th European conference on Machine Learning
PAC-learnability of probabilistic deterministic finite state automata in terms of variation distance

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Learnability of probabilistic automata via oracles

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Learning rational stochastic languages

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Learning PDFA with asynchronous transitions

ICGI'10 Proceedings of the 10th international colloquium conference on Grammatical inference: theoretical results and applications
A lower bound for learning distributions generated by probabilistic automata

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
Learning probabilistic automata: A study in state distinguishability

Theoretical Computer Science
Software model synthesis using satisfiability solvers

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an improvement of an algorithm due to Clark and Thollard (Journal of Machine Learning Research, 2004) for PAC-learning distributions generated by Probabilistic Deterministic Finite Automata (PDFA). Our algorithm is an attempt to keep the rigorous guarantees of the original one but use sample sizes that are not as astronomical as predicted by the theory. We prove that indeed our algorithm PAC-learns in a stronger sense than the Clark-Thollard. We also perform very preliminary experiments: We show that on a few small targets (8-10 states) it requires only hundreds of examples to identify the target. We also test the algorithm on a web logfile recording about a hundred thousand sessions from an ecommerce site, from which it is able to extract some nontrivial structure in the form of a PDFA with 30-50 states. An additional feature, in fact partly explaining the reduction in sample size, is that our algorithm does not need as input any information about the distinguishability of the target.