Context dependent class language model based on word co-occurrence matrix in LSA framework for speech recognition

  • Authors:
  • Welly Naptali;Masatoshi Tsuchiya;Seiichi Nakagawa

  • Affiliations:
  • Toyohashi University of Technology, Department of Information and Computer Sciences, Toyohashi, Aichi, Japan;Toyohashi University of Technology, Information and Media Center, Toyohashi, Aichi, Japan;Toyohashi University of Technology, Department of Information and Computer Sciences, Toyohashi, Aichi, Japan

  • Venue:
  • ACS'08 Proceedings of the 8th conference on Applied computer scince
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the issue of data sparseness problem in language model (LM). Using class LM is one way to avoid this problem. In class LM, infrequent words are supported by more frequent words in the same class. This paper investigates a class LM based on LSA. A word-document matrix is usually used to represent a corpus in LSA framework. However, this matrix ignores word order in the sentence. We propose several word co-occurrence matrices that keep word order. Together with these matrices, we define a context dependent class (CDC) LM which distinguishes classes according to their context in the sentences. Experiments on Wall Street Journal (WSJ) corpus show that the word co-occurrence matrix works better than word-document matrix. Furthermore, the CDC achieves better perplexity than the traditional class LM based on LSA.