Adaptive Parallel Sentences Mining from Web Bilingual News Collection
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Mining the Web for bilingual text
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Automatic extraction of bilingual word pairs using inductive chain learning in various languages
Information Processing and Management: an International Journal
The web as a platform to build machine translation resources
Proceedings of the 2009 international workshop on Intercultural collaboration
Exploring the sawa corpus: collection and deployment of a parallel corpus English--Swahili
Language Resources and Evaluation
Alignment of paragraphs in bilingual texts using bilingual dictionaries and dynamic programming
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Automatic filtering of bilingual corpora for statistical machine translation
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Hi-index | 0.00 |
This paper presents a study on optimizing sentence pair alignment scores of a bilingual sentence alignment module. Five candidate scores based on perplexity and sentence length are introduced and tested. Then a linear regression model based on those candidates is proposed and trained to predict sentence pairs' alignment quality scores solicited from human subjects. Experiments are carried out on data automatically collected from Internet. The correlation between the scores generated by the linear regression model and the scores from human subjects is in the range of the inter-subject agreement score correlations. Pearson's correlation ranges from 0.53 up to 0.72 in our experiments.