Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Evaluating multilingual gisting of Web pages
Evaluating multilingual gisting of Web pages
Using cognates to align sentences in bilingual corpora
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Semi-automatic acquisition of domain-specific translation lexicons
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A word-to-word model of translational equivalence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Toward memory-based translation
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
CTM: an example-based translation aid system
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Word completion: a first step toward target-text mediated IMT
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Automatic detection of omissions in translations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
A Multilingual Procedure for Dictionary-Based Sentence Alignment
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Bootstrapping the Lexicon Building Process for Machine Translation between `New' Languages
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Bitext maps and alignment via pattern recognition
Computational Linguistics
Encoding a parallel corpus for automatic terminology extraction
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
A word-to-word model of translational equivalence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Bitext correspondences through rich mark-up
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Methods and practical issues in evaluating alignment techniques
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Empirically estimating order constraints for content planning in generation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Chinese-Korean word alignment based on linguistic comparison
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Sentence alignment using P-NNT and GMM
Computer Speech and Language
Automatic verb classification using multilingual resources
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Multilingual collocation extraction: issues and solutions
MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
Chinese-Uyghur sentence alignment: an approach based on anchor sentences
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Selecting target word using contexonym comparison method
Proceedings of the 2007 conference on Human interface: Part I
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Improving corpus comparability for bilingual lexicon extraction from comparable corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Extraction of bilingual cognates from wikipedia
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Hi-index | 0.00 |
The first step in most empirical work in multilingual NLP is to construct maps of the correspondence between texts and their translations (bitext maps). The Smooth Injective Map Recognizer (SIMR) algorithm presented here is a generic pattern recognition algorithm that is particularly well-suited to mapping bitext correspondence. SIMR is faster and significantly more accurate than other algorithms in the literature. The algorithm is robust enough to use on noisy texts, such as those resulting from OCR input, and on translations that are not very literal. SIMR encapsulates its language-specific heuristics, so that it can be ported to any language pair with a minimal effort.