Unsupervised grammar inference using the minimum description length principle

Authors:
Upendra Sapkota;Barrett R. Bryant;Alan Sprague
Affiliations:
Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer Science and Engineering, University of North Texas, Denton, TX;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL
Venue:
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2012

Citing 11
Cited 0

Attribute grammar paradigms—a high-level methodology in language implementation

ACM Computing Surveys (CSUR)
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Semi-automatic grammar recovery

Software—Practice & Experience
Learning Context-Free Grammars with a Simplicity Bias

ECML '00 Proceedings of the 11th European Conference on Machine Learning
Regular Grammatical Inference from Positive and Negative Samples by Genetic Search: the GIG Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
When and how to develop domain-specific languages

ACM Computing Surveys (CSUR)
Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering

ICGI '08 Proceedings of the 9th international colloquium on Grammatical Inference: Algorithms and Applications
AN UNSUPERVISED INCREMENTAL LEARNING ALGORITHM FOR DOMAIN-SPECIFIC LANGUAGE DEVELOPMENT

Applied Artificial Intelligence
An Introduction to Kolmogorov Complexity and Its Applications

An Introduction to Kolmogorov Complexity and Its Applications
Identifying hierarchical structure in sequences: a linear-time algorithm

Journal of Artificial Intelligence Research
Grammatical Inference: Learning Automata and Grammars

Grammatical Inference: Learning Automata and Grammars

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context Free Grammars (CFGs) are widely used in programming language descriptions, natural language processing, compilers, and other areas of software engineering where there is a need for describing the syntactic structures of programs. Grammar inference (GI) is the induction of CFGs from sample programs and is a challenging problem. We describe an unsupervised GI approach which uses simplicity as the criterion for directing the inference process and beam search for moving from a complex to a simpler grammar. We use several operators to modify a grammar and use the Minimum Description Length (MDL) Principle to favor simple and compact grammars. The effectiveness of this approach is shown by a case study of a domain specific language. The experimental results show that an accurate grammar can be inferred in a reasonable amount of time.