Context dependent class language model based on word co-occurrence matrix in LSA framework for speech recognition

Authors:
Welly Naptali;Masatoshi Tsuchiya;Seiichi Nakagawa
Affiliations:
Toyohashi University of Technology, Department of Information and Computer Sciences, Toyohashi, Aichi, Japan;Toyohashi University of Technology, Information and Media Center, Toyohashi, Aichi, Japan;Toyohashi University of Technology, Department of Information and Computer Sciences, Toyohashi, Aichi, Japan
Venue:
ACS'08 Proceedings of the 8th conference on Applied computer scince
Year:
2008

Citing 3
Cited 0

Class-based n-gram models of natural language

Computational Linguistics
A neural probabilistic language model

The Journal of Machine Learning Research
Multi-Class Composite N-gram language model for spoken language processing using multiple word clusters

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the issue of data sparseness problem in language model (LM). Using class LM is one way to avoid this problem. In class LM, infrequent words are supported by more frequent words in the same class. This paper investigates a class LM based on LSA. A word-document matrix is usually used to represent a corpus in LSA framework. However, this matrix ignores word order in the sentence. We propose several word co-occurrence matrices that keep word order. Together with these matrices, we define a context dependent class (CDC) LM which distinguishes classes according to their context in the sentences. Experiments on Wall Street Journal (WSJ) corpus show that the word co-occurrence matrix works better than word-document matrix. Furthermore, the CDC achieves better perplexity than the traditional class LM based on LSA.