Partially distribution-free learning of regular languages from positive samples

Authors:
Alexander Clark;Franck Thollard
Affiliations:
University of Geneva, Genève, Switzerland;Université Jean Monnet, Saint-Etienne, Cédex, France
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 9
Cited 3

A theory of the learnable

Communications of the ACM
Crytographic limitations on learning Boolean formulae and finite automata

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
On the learnability of discrete distributions

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
On the learnability and usage of acyclic probabilistic finite automata

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Learning Regular Languages from Simple Positive Examples

Machine Learning
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Finite-state transducers in language and speech processing

Computational Linguistics
PAC-learnability of Probabilistic Deterministic Finite State Automata

The Journal of Machine Learning Research
Relating probabilistic grammars and automata

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Another look at indirect negative evidence

CACLA '09 Proceedings of the EACL 2009 Workshop on Cognitive Aspects of Computational Language Acquisition
PAC-learning unambiguous NTS languages

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Large scale inference of deterministic transductions: tenjinno problem 1

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Regular languages are widely used in NLP today in spite of their shortcomings. Efficient algorithms that can reliably learn these languages, and which must in realistic applications only use positive samples, are necessary. These languages are not learnable under traditional distribution free criteria. We claim that an appropriate learning framework is PAC learning where the distributions are constrained to be generated by a class of stochastic automata with support equal to the target concept. We discuss how this is related to other learning paradigms. We then present a simple learning algorithm for regular languages, and a self-contained proof that it learns according to this partially distribution free criterion.