The SuperARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources

Authors:
Wen Wang;Mary P. Harper
Affiliations:
Purdue University, West Lafayette, IN;Purdue University, West Lafayette, IN
Venue:
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Year:
2002

Citing 12
Cited 15

Self-organized language modeling for speech recognition

Readings in speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Exploiting syntactic structure for natural language modeling

Exploiting syntactic structure for natural language modeling
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Probabilistic top-down parsing and language modeling

Computational Linguistics
The effectiveness of corpus-induced dependency grammars for post-processing speech

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Structural disambiguation with constraint propagation

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
What is the minimal set of fragments that achieves maximal parse accuracy?

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Immediate-head parsing for language models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Joint and conditional estimation of tagging and parsing models

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A variable-length category-based n-gram language model

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Use of deep linguistic features for the recognition and labeling of semantic arguments

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Discriminative syntactic language modeling for speech recognition

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Guiding a constraint dependency parser with supertags

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Statistical language modeling with performance benchmarks using various levels of syntactic-semantic information

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A salience driven approach to robust input interpretation in multimodal conversational systems

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A statistical constraint dependency grammar (CDG) parser

IncrementParsing '04 Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together
Integrating multi-level linguistic knowledge with a unified framework for Mandarin speech recognition

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Shrinking exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A joint language model with fine-grain syntactic tags

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Self-training with products of latent variable grammars

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Joint reranking of parsing and word recognition with automatic segmentation

Computer Speech and Language
Statistical machine translation with local language models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A fast re-scoring strategy to capture long-distance dependencies

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Syntactic language modeling with formal grammars

Speech Communication
A scalable distributed syntactic, semantic, and lexical language model

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new almost-parsing language model incorporating multiple knowledge sources that is based upon the concept of constraint Dependency Grammars is presented in this paper. Lexical features and syntactic constraints are tightly integrated into a uniform linguistic structure called a SuperARV that is associated with a word in the lexicon. The SuperARV language model reduces perplexity and word error rate compared to trigram, part-of-speech-based, and parser-based language models. The relative contributions of the various knowledge sources to the strength of our model are also investigated by using constraint relaxation at the level of the knowledge sources. We have found that although each knowledge source contributes to language model quality, lexical features are an outstanding contributor when they are tightly integrated with word identity and syntactic constraints. Our investigation also suggests possible reasons for the reported poor performance of several probabilistic dependency grammar models in the literature.