Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries

Authors:
H. H. Meng;K. C. Siu
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2002

Citing 14
Cited 7

Stochastic representation of conceptual structure in the ATIS task

HLT '91 Proceedings of the workshop on Speech and Natural Language
Evaluation of spoken language systems: the ATIS domain

HLT '90 Proceedings of the workshop on Speech and Natural Language
The CMU air travel information service: understanding spontaneous speech

HLT '90 Proceedings of the workshop on Speech and Natural Language
TINA: a natural language system for spoken language applications

Computational Linguistics
Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction

IEEE Transactions on Knowledge and Data Engineering
The Application of Semantic Classification Trees to Natural Language Understanding

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hidden Understanding Models for Statistical Sentence Understanding

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Glr*: a robust grammar-focused parser for spontaneously spoken language

Glr*: a robust grammar-focused parser for spontaneously spoken language
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Growing semantic grammars

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Bayesian grammar induction for language modeling

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Expanding the scope of the ATIS task: the ATIS-3 corpus

HLT '94 Proceedings of the workshop on Human Language Technology
Statistical natural language understanding using hidden clumpings

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
PROFER: predictive, robust finite-state parsing for spoken language

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Automatic learning of text-to-concept mappings exploiting WordNet-like lexical networks

Proceedings of the 2005 ACM symposium on Applied computing
Semantic Segment Extraction and Matching for Internet FAQ Retrieval

IEEE Transactions on Knowledge and Data Engineering
Rapid bootstrapping of statistical spoken dialogue systems

Speech Communication
Spoken language understanding using weakly supervised learning

Computer Speech and Language
Goal detection from natural language queries

NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
DeepPurple: estimating sentence semantic similarity using n-gram regression models and web snippets

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Checking feasible completeness of domain models with natural language queries

APCCM '12 Proceedings of the Eighth Asia-Pacific Conference on Conceptual Modelling - Volume 130

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a methodology for semiautomatic grammar induction from unannotated corpora of information-seeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words 驴spatially.驴 These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words 驴temporally.驴 These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the atis (Air Travel Information Service) corpus and the semiautomatically-induced grammar $G_{SA}$ is compared to an entirely handcrafted grammar $G_H$. $G_H$ took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. $G_{SA}$ took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort.