Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Word alignment with cohesion constraint
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A probability model to improve word alignment
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Extensions to HMM-based statistical word alignment models
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
An evaluation exercise for word alignment
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Joining linguistic and statistical methods for Spanish-to-Basque speech translation
Speech Communication
Incorporating Linguistic Information to Statistical Word-Level Alignment
CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Hi-index | 0.00 |
In this paper, a novel phrase alignment strategy combining linguistic knowledge and cooccurrence measures extracted from bilingual corpora is presented. The algorithm is mainly divided into four steps, namely phrase selection and classification, phrase alignment, one-to-one word alignment and postprocessing. The first stage selects a linguistically-derived set of phrases that convey a unified meaning during translation and are therefore aligned together in parallel texts. These phrases include verb phrases, idiomatic expressions and date expressions. During the second stage, very high precision links between these selected phrases for both languages are produced. The third step performs a statistical word alignment using association measures and link probabilities with the remaining unaligned tokens, and finally the fourth stage takes final decisions on unaligned tokens based on linguistic knowledge. Experiments are reported for an English-Spanish parallel corpus, with a detailed description of the evaluation measure and manual reference used. Results show that phrase cooccurrence measures convey a complementary information to word cooccurrences and a stronger evidence of a correct alignment, successfully introducing linguistic knowledge in a statistical word alignment scheme. Precision, Recall and Alignment Error Rate (AER) results are presented, outperforming state-of-the-art alignment algorithms.