Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A hierarchical phrase-based model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improved statistical machine translation using paraphrases
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Phrasetable smoothing for statistical machine translation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Syntactic constraints on paraphrases extracted from parallel corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Lattice-based minimum error rate training for statistical machine translation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improved statistical machine translation using monolingually-derived paraphrases
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Statistical Machine Translation
Statistical Machine Translation
A word-class approach to labeling PSCFG rules for machine translation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Incorporating source-language paraphrases into phrase-based SMT with confusion networks
SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Power-law distributions for paraphrases extracted from bilingual corpora
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Improve SMT quality with automatically extracted paraphrase rules
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Joint learning of a dual SMT system for paraphrase generation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Distributional phrasal paraphrase generation for statistical machine translation
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Hi-index | 0.00 |
This paper describes how to cluster together the phrases of a phrase-based statistical machine translation (SMT) system, using information in the phrase table itself. The clustering is symmetric and recursive: it is applied both to source-language and target-language phrases, and the clustering in one language helps determine the clustering in the other. The phrase clusters have many possible uses. This paper looks at one of these uses: smoothing the conditional translation model (TM) probabilities employed by the SMT system. We incorporated phrase-cluster-derived probability estimates into a baseline loglinear feature combination that included relative frequency and lexically-weighted conditional probability estimates. In Chinese-English (C-E) and French-English (F-E) learning curve experiments, we obtained a gain over the baseline in 29 of 30 tests, with a maximum gain of 0.55 BLEU points (though most gains were fairly small). The largest gains came with medium (200--400K sentence pairs) rather than with small (less than 100K sentence pairs) amounts of training data, contrary to what one would expect from the paraphrasing literature. We have only begun to explore the original smoothing approach described here.