Class-based n-gram models of natural language
Computational Linguistics
A systematic comparison of various statistical alignment models
Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A trigram statistical language model algorithm for Chinese word segmentation
FAW'07 Proceedings of the 1st annual international conference on Frontiers in algorithmics
Hi-index | 0.00 |
Phrase-based models and class-based models are both variants of classical n-gram models. In this paper, we propose an approach by merging phrase-based models and class-based models together. In the phrase-based part, we use bilingual parallel corpus to extract phrases with a method deriving from phrase-based translation models. Then we partition these phrases into phrase classes by minimizing the loss of the average mutual information with the aid of a count matrix. Our experimental results suggest that phrase-based models can capture more key information than word-based models and class-based models can capture the relationship among similar words or phrases and thus solve the problem of data sparseness in some sense.