Class-based n-gram models of natural language
Computational Linguistics
Spontaneous dialogue speech recognition using cross-word context constrained word graphs
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Variable-order N-gram generation by word-class splitting and consecutive word grouping
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Multi-class composite N-gram based on connection direction
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Exploring asymmetric clustering for statistical language modeling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACS'08 Proceedings of the 8th conference on Applied computer scince
Multi-speaker language modeling
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Hi-index | 0.00 |
In this paper, a new language model, the Multi-Class Composite N-gram, is proposed to avoid a data sparseness problem for spoken language in that it is difficult to collect training data. The Multi-Class Composite N-gram maintains an accurate word prediction capability and reliability for sparse data with a compact model size based on multiple word clusters, called Multi-Classes. In the Multi-Class, the statistical connectivity at each position of the N-grams is regarded as word attributes, and one word cluster each is created to represent the positional attributes. Furthermore, by introducing higher order word N-grams through the grouping of frequent word successions, Multi-Class N-grams are extended to Multi-Class Composite N-grams. In experiments, the Multi-Class Composite N-grams result in 9.5% lower perplexity and a 16% lower word error rate in speech recognition with a 40% smaller parameter size than conventional word 3-grams.