Using noisy bilingual data for statistical machine translation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Efficient optimization for bilingual sentence alignment based on linear regression
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Named entity translation matching and learning: With application for mining unseen translations
ACM Transactions on Information Systems (TOIS)
Extracting parallel sub-sentential fragments from non-parallel corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
BiTAM: bilingual topic AdMixture models for word alignment
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Improved sentence alignment on parallel web pages using a stochastic tree alignment model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Extracting parallel sentences from comparable corpora using document level alignment
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
An empirical study on web mining of parallel data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Two ways to use a noisy parallel news corpus for improving statistical machine translation
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
A minimally supervised approach for detecting and ranking document translation pairs
WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Automatic parallel fragment extraction from noisy data
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Position-Aligned translation model for citation recommendation
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
In this paper a robust, adaptive approach for miningparallel sentences from a bilingual comparable newscollection is described. Sentence length models andlexicon-based models are combined under a maximumlikelihood criterion. Specific models are proposed to handleinsertions and deletions that are frequent in bilingualdata collected from the web. The proposed approach isadaptive, updating the translation lexicon iteratively usingthe mined parallel data to get better vocabulary coverageand translation probability parameter estimation.Experiments are carried out on 10 years of Xinhuabilingual news collection. Using the mined data, we getsignificant improvement in word-to-word alignment accuracyin machine translation modeling.