Unsupervised grammar inference using the minimum description length principle

  • Authors:
  • Upendra Sapkota;Barrett R. Bryant;Alan Sprague

  • Affiliations:
  • Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL;Department of Computer Science and Engineering, University of North Texas, Denton, TX;Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Context Free Grammars (CFGs) are widely used in programming language descriptions, natural language processing, compilers, and other areas of software engineering where there is a need for describing the syntactic structures of programs. Grammar inference (GI) is the induction of CFGs from sample programs and is a challenging problem. We describe an unsupervised GI approach which uses simplicity as the criterion for directing the inference process and beam search for moving from a complex to a simpler grammar. We use several operators to modify a grammar and use the Minimum Description Length (MDL) Principle to favor simple and compact grammars. The effectiveness of this approach is shown by a case study of a domain specific language. The experimental results show that an accurate grammar can be inferred in a reasonable amount of time.