Multi-class composite N-gram based on connection direction

  • Authors:
  • H. Yamamoto;Y. Sagisaka

  • Affiliations:
  • ATR Interpreting Telephony Res. Labs., Kyoto, Japan;-

  • Venue:
  • ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word, In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to a multi-class composite N-gram that unit is a multi-class 2-gram and joined word. The multi-class composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.