Exploring asymmetric clustering for statistical language modeling

Authors:
Jianfeng Gao;Joshua T. Goodman;Guihong Cao;Hang Li
Affiliations:
Microsoft Research, Asia, Beijing, P.R.C;Microsoft Research, Redmond, Washington;Tianjin Univerisity, China;Microsoft Research, Asia Beijing, P.R.C
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 8
Cited 5

A statistical approach to machine translation

Computational Linguistics
Self-organized language modeling for speech recognition

Readings in speech recognition
Class-based n-gram models of natural language

Computational Linguistics
Toward a unified approach to statistical language modeling for Chinese

ACM Transactions on Asian Language Information Processing (TALIP)
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Multi-Class Composite N-gram language model for spoken language processing using multiple word clusters

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A novel word clustering algorithm based on latent semantic analysis

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Multi-class composite N-gram based on connection direction

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Modeling of long distance context dependency in Chinese

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Multi-speaker language modeling

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Long distance bigram models applied to word clustering

Pattern Recognition
Half-context language models

Computational Linguistics
Long distance dependency in language modeling: an empirical study

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The n-gram model is a stochastic model, which predicts the next word (predicted word) given the previous words (conditional words) in a word sequence. The cluster n-gram model is a variant of the n-gram model in which similar words are classified in the same cluster. It has been demonstrated that using different clusters for predicted and conditional words leads to cluster models that are superior to classical cluster models which use the same clusters for both words. This is the basis of the asymmetric cluster model (ACM) discussed in our study. In this paper, we first present a formal definition of the ACM. We then describe in detail the methodology of constructing the ACM. The effectiveness of the ACM is evaluated on a realistic application, namely Japanese Kana-Kanji conversion. Experimental results show substantial improvements of the ACM in comparison with classical cluster models and word n-gram models at the same model size. Our analysis shows that the high-performance of the ACM lies in the asymmetry of the model.