Stochastic representation of conceptual structure in the ATIS task
HLT '91 Proceedings of the workshop on Speech and Natural Language
Evaluation of spoken language systems: the ATIS domain
HLT '90 Proceedings of the workshop on Speech and Natural Language
The CMU air travel information service: understanding spontaneous speech
HLT '90 Proceedings of the workshop on Speech and Natural Language
TINA: a natural language system for spoken language applications
Computational Linguistics
Acquisition of Linguistic Patterns for Knowledge-Based Information Extraction
IEEE Transactions on Knowledge and Data Engineering
The Application of Semantic Classification Trees to Natural Language Understanding
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hidden Understanding Models for Statistical Sentence Understanding
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Glr*: a robust grammar-focused parser for spontaneously spoken language
Glr*: a robust grammar-focused parser for spontaneously spoken language
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Bayesian grammar induction for language modeling
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Expanding the scope of the ATIS task: the ATIS-3 corpus
HLT '94 Proceedings of the workshop on Human Language Technology
Statistical natural language understanding using hidden clumpings
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
PROFER: predictive, robust finite-state parsing for spoken language
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Automatic learning of text-to-concept mappings exploiting WordNet-like lexical networks
Proceedings of the 2005 ACM symposium on Applied computing
Semantic Segment Extraction and Matching for Internet FAQ Retrieval
IEEE Transactions on Knowledge and Data Engineering
Rapid bootstrapping of statistical spoken dialogue systems
Speech Communication
Spoken language understanding using weakly supervised learning
Computer Speech and Language
Goal detection from natural language queries
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
DeepPurple: estimating sentence semantic similarity using n-gram regression models and web snippets
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Checking feasible completeness of domain models with natural language queries
APCCM '12 Proceedings of the Eighth Asia-Pacific Conference on Conceptual Modelling - Volume 130
Hi-index | 0.00 |
This paper describes a methodology for semiautomatic grammar induction from unannotated corpora of information-seeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words 驴spatially.驴 These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words 驴temporally.驴 These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the atis (Air Travel Information Service) corpus and the semiautomatically-induced grammar $G_{SA}$ is compared to an entirely handcrafted grammar $G_H$. $G_H$ took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. $G_{SA}$ took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort.