A Cache-Based Natural Language Model for Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Class-based n-gram models of natural language
Computational Linguistics
Testing the correlation of word error rate and perplexity
Speech Communication
The Journal of Machine Learning Research
Topic identification in discourse
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Factored language models and generalized parallel backoff
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Language modeling with sentence-level mixtures
HLT '94 Proceedings of the workshop on Human Language Technology
A novel word clustering algorithm based on latent semantic analysis
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Word Topic Models for Spoken Document Retrieval and Transcription
ACM Transactions on Asian Language Information Processing (TALIP)
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Language models (LMs) are an important field of study in automatic speech recognition (ASR) systems. LM helps acoustic models find the corresponding word sequence of a given speech signal. Without it, ASR systems would not understand the language and it would be hard to find the correct word sequence. During the past few years, researchers have tried to incorporate long-range dependencies into statistical word-based n-gram LMs. One of these long-range dependencies is topic. Unlike words, topic is unobservable. Thus, it is required to find the meanings behind the words to get into the topic. This research is based on the belief that nouns contain topic information. We propose a new approach for a topic-dependent LM, where the topic is decided in an unsupervised manner. Latent Semantic Analysis (LSA) is employed to reveal hidden (latent) relations among nouns in the context words. To decide the topic of an event, a fixed size word history sequence (window) is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments were conducted on an English corpus and a Japanese corpus: The Wall Street Journal corpus and Mainichi Shimbun (Japanese newspaper) corpus. The results show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based n-gram LM, their interpolated LM, a cache-based LM, a topic-dependent LM based on n-gram, and a topic-dependent LM based on Latent Dirichlet Allocation (LDA). The n-best list rescoring was conducted to validate its application in ASR systems.