Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A systematic comparison of various statistical alignment models
Computational Linguistics
Adaptive Parallel Sentences Mining from Web Bilingual News Collection
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computational Linguistics - Special issue on web as corpus
Models of translational equivalence among words
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using noisy bilingual data for statistical machine translation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Improved cross-language retrieval using backoff translation
HLT '01 Proceedings of the first international conference on Human language technology research
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Reliable measures for aligning Japanese-English news articles and sentences
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Improving IBM word-alignment model 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A geometric view on bilingual lexicon extraction from comparable corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Mining new word translations from comparable corpora
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Named entity discovery using comparable news articles
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Finding translations for low-frequency words in comparable corpora
Machine Translation
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Looking up phrase rephrasings via a pivot language
COGALEX '08 Proceedings of the workshop on Cognitive Aspects of the Lexicon
Retrieving bilingual verb-noun collocations by integrating cross-language category hierarchies
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Mining a comparable text corpus for a Vietnamese - French statistical machine translation system
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Train the machine with what it can learn: corpus selection for SMT
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Extracting parallel fragments from comparable corpora for data-to-text generation
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Improving corpus comparability for bilingual lexicon extraction from comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Large scale parallel document mining for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Cross lingual text classification by mining multilingual topics from wikipedia
Proceedings of the fourth ACM international conference on Web search and data mining
Measuring historical word sense variation
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Clustering comparable corpora for bilingual lexicon extraction
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Two ways to use a noisy parallel news corpus for improving statistical machine translation
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Paraphrase fragment extraction from monolingual comparable corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Extracting parallel phrases from comparable data
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Active learning with multiple annotations for comparable data classification task
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Unsupervised alignment of comparable data and text resources
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Cross-lingual text fragment alignment using divergence from randomness
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Parallel sentence generation from comparable corpora for improved SMT
Machine Translation
New approach for collecting high quality parallel corpora from multilingual websites
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Toward statistical machine translation without parallel corpora
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Transliteration mining using large training and test sets
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Automatic parallel fragment extraction from noisy data
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
ACCURAT toolkit for multi-level alignment and information extraction from comparable corpora
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Using discourse information for paraphrase extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Twitter translation using translation-based cross-lingual retrieval
WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Finding synonyms and other semantically-similar terms from coselection data
AWC '13 Proceedings of the First Australasian Web Conference - Volume 144
Hi-index | 0.00 |
We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processing-inspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.