Towards Feasible PAC-Learning of Probabilistic Deterministic Finite Automata

  • Authors:
  • Jorge Castro;Ricard Gavaldà

  • Affiliations:
  • Departament de Llenguatges i Sistemes Informàtics LARCA Research Group, Universitat Politècnica de Catalunya, Barcelona;Departament de Llenguatges i Sistemes Informàtics LARCA Research Group, Universitat Politècnica de Catalunya, Barcelona

  • Venue:
  • ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an improvement of an algorithm due to Clark and Thollard (Journal of Machine Learning Research, 2004) for PAC-learning distributions generated by Probabilistic Deterministic Finite Automata (PDFA). Our algorithm is an attempt to keep the rigorous guarantees of the original one but use sample sizes that are not as astronomical as predicted by the theory. We prove that indeed our algorithm PAC-learns in a stronger sense than the Clark-Thollard. We also perform very preliminary experiments: We show that on a few small targets (8-10 states) it requires only hundreds of examples to identify the target. We also test the algorithm on a web logfile recording about a hundred thousand sessions from an ecommerce site, from which it is able to extract some nontrivial structure in the form of a PDFA with 30-50 states. An additional feature, in fact partly explaining the reduction in sample size, is that our algorithm does not need as input any information about the distinguishability of the target.