Learning constraint-based grammars from representative examples: theory and applications

Authors:
Kathleen Mckeown;Smaranda Muresan
Affiliations:
Columbia University;Columbia University
Venue:
Learning constraint-based grammars from representative examples: theory and applications
Year:
2006

Citing 0
Cited 3

Learning to map text to graph-based meaning representations via grammar induction

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Ontology-Based semantic interpretation as grammar rule constraints

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Learning for deep language understanding

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computationally efficient models for natural language understanding can have a wide variety of applications starting from text mining and question answering, to natural language interfaces to databases. Constraint-based grammar formalisms have been widely used for deep language understanding. Yet, one serious obstacle for their use in real world applications is that these formalisms have overlooked an important requirement: learnability. Currently, there is a poor match between these grammar formalisms and existing learning methods. This dissertation defines a new type of constraint-based grammars, Lexicalized Well-Founded Grammars (LWFGs), which allow deep language understanding and are learnable. These grammars model both syntax and semantics and have constraints at the rule level for semantic composition and semantic interpretation: The interpretation constraints allow access to meaning during language processing. They establish links between linguistic expressions and the entities they refer to in the real world. We use an ontology-based interpretation, proposing a semantic representation that can be conceived as an ontology query language. This representation is sufficiently expressive to represent many aspects of language and yet sufficiently restrictive to support learning and tractable inferences. In this thesis, we propose a new relational learning model for LWFG induction. The learner is presented with a small set of positive representative examples, which consist of utterances paired with their semantic representations. We have proved that the search space for grammar induction is a complete grammar lattice, which allows the construction and generalization of the hypotheses and guarantees the uniqueness of the solution, regardless of the order of learning. We have proved a learnability theorem and leave provided polynomial algorithms for LWFG induction, proving their soundness. The learnability theorem extends significantly the class of problems learnable by Inductive Logic Programming methods. In this dissertation; we have implemented a system that represents an experimental platform for all the theoretical algorithms. The system has the practical advantage of implementing sound grammar revision and grammar merging, which allow an incremental coverage of natural language fragments. We have provided qualitative evaluations that cover the following issues: coverage of diverse and complex linguistic phenomena; terminological knowledge acquisition from natural language definitions; and handling of both precise and vague questions with precise answers at the concept level.