Machine learning: a theoretical approach
Machine learning: a theoretical approach
Efficient learning of context-free grammars from positive structural examples
Information and Computation
Inference of Reversible Languages
Journal of the ACM (JACM)
Inductive Inference: Theory and Methods
ACM Computing Surveys (CSUR)
On the Synthesis of Finite-State Machines from Samples of Their Behavior
IEEE Transactions on Computers
Hi-index | 0.98 |
We describe a technique for forming a context free grammar for a document that has some kind of tagging-structural or typographical-but no concise description of the structure is available. The technique is based on ideas from machine learning. It forms first a set of finite-state automata describing the document completely. These automata are modified by considering certain context conditions; the modifications correspond to generalizing the underlying languages. Finally, the automata are converted into regular expressions, which are then used to construct the grammar. An alternative representation, characteristic k-grams, is also introduced. Additionally, the paper describes some interactive operations necessary for generating a grammar for a large and complicated document.