Improved topic-dependent language modeling using information retrieval techniques

Authors:
M. Mahajan;D. Beeferman;X. D. Huang
Affiliations:
Microsoft Corp., Redmond, WA, USA;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 10

Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Topic modeling in fringe word prediction for AAC

Proceedings of the 11th international conference on Intelligent user interfaces
Detection of language (model) errors

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Language model adaptation for statistical machine translation with structured query models

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Corpus studies in word prediction

Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
Web resources for language modeling in conversational speech recognition

ACM Transactions on Speech and Language Processing (TSLP)
Adapting word prediction to subject matter without topic-labeled data

Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility
Adaptive language modeling for word prediction

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
A multiple classifier approach to detect Chinese character recognition errors

Pattern Recognition
A topic identification task for modern standard Arabic

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term context information about the topic of discussion. We use information retrieval techniques to generalize the available context information for topic-dependent language modeling. We demonstrate the effectiveness of this technique by performing experiments on the Wall Street Journal text corpus, which is a relatively difficult task for topic-dependent language modeling since the text is relatively homogeneous. The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model.