Multi-class composite N-gram based on connection direction

Authors:
H. Yamamoto;Y. Sagisaka
Affiliations:
ATR Interpreting Telephony Res. Labs., Kyoto, Japan;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Year:
1999

Citing 0
Cited 6

Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Multi-Class Composite N-gram language model for spoken language processing using multiple word clusters

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Exploring asymmetric clustering for statistical language modeling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Shrinking exponential language models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A Bayesian methodology for semi-automated task analysis

Proceedings of the 2007 conference on Human interface: Part I
Semantic spaces for improving language modeling

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new word-clustering technique is proposed to efficiently build statistically salient class 2-grams from language corpora. By splitting word neighboring characteristics into word-preceding and following directions, multiple (two-dimensional) word classes are assigned to each word, In each side, word classes are merged into larger clusters independently according to preceding or following word distributions. This word-clustering can provide more efficient and statistically reliable word clusters. Further, we extend it to a multi-class composite N-gram that unit is a multi-class 2-gram and joined word. The multi-class composite N-gram showed better performance both in perplexity and recognition rates with one thousandth smaller size than conventional word 2-grams.