Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Reducing the space requirement of suffix trees
Software—Practice & Experience
Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Identifying word translations in non-parallel texts
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Automatic identification of word translations from unrelated English and German corpora
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Using noisy bilingual data for statistical machine translation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Multi-level bootstrapping for extracting parallel sentences from a quasi-comparable corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Cross lingual text classification by mining multilingual topics from wikipedia
Proceedings of the fourth ACM international conference on Web search and data mining
An Expectation Maximization algorithm for textual unit alignment
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Hi-index | 0.00 |
We introduce Bilingual Suffix Trees (BST), a data structure that is suitable for exploiting comparable corpora. We discuss algorithms that use BSTs in order to create parallel corpora and learn translations of unseen words from comparable corpora. Starting with a small bilingual dictionary that was derived automatically from a corpus of 5.000 parallel sentences, we have automatically extracted a corpus of 33.926 parallel phrases of size greater than 3, and learned 9 new word translations from a comparable corpus of 1.3M words (100.000 sentences).