Bilingual Sentence Alignment: Balancing Robustness and Accuracy
Machine Translation
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A portable algorithm for mapping bitext correspondence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Improving IBM word-alignment model 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Sentence alignment using P-NNT and GMM
Computer Speech and Language
Automatic extraction of translations from web-based bilingual materials
Machine Translation
A statistical approach to crosslingual natural language tasks
Journal of Algorithms
The SAWA corpus: a parallel corpus English - Swahili
AfLaT '09 Proceedings of the First Workshop on Language Technologies for African Languages
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Language-independent bilingual terminology extraction from a multilingual parallel corpus
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Improved sentence alignment on parallel web pages using a stochastic tree alignment model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Findings of the 2009 workshop on statistical machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Joshua: an open source toolkit for parsing-based machine translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Comparison, selection and use of sentence alignment algorithms for new language pairs
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
PORTAGE: a phrase-based machine translation system
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Chinese-Uyghur sentence alignment: an approach based on anchor sentences
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Unsupervised tokenization for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Exploiting aligned parallel corpora in multilingual studies and applications
IWIC'07 Proceedings of the 1st international conference on Intercultural collaboration
Local context selection for aligning sentences in parallel corpora
CONTEXT'07 Proceedings of the 6th international and interdisciplinary conference on Modeling and using context
Context-based sentence alignment in parallel corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
IIT'09 Proceedings of the 6th international conference on Innovations in information technology
Transferring structural markup across translations using multilingual alignment and projection
Proceedings of the 10th annual joint conference on Digital libraries
Extracting parallel sentences from comparable corpora using document level alignment
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hindi-to-Urdu machine translation through transliteration
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An empirical study on web mining of parallel data
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Fast-Champollion: a fast and robust sentence alignment algorithm
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Acquiring bilingual lexica from keyword listings
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Building a web-based parallel corpus and filtering out machine-translated text
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Exploring the sawa corpus: collection and deployment of a parallel corpus English--Swahili
Language Resources and Evaluation
Alignment of paragraphs in bilingual texts using bilingual dictionaries and dynamic programming
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Bilingual sentence alignment based on punctuation statistics and lexicon
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Extracting parallel paragraphs and sentences from english-persian translated documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Analyzing parallelism and domain similarities in the MAREC patent corpus
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Lexical-based alignment for reconstruction of structure in parallel texts
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Evaluating indirect strategies for Chinese-Spanish statistical machine translation
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We present a new method for aligning sentences with their translations in a parallel bilingual corpus. Previous approaches have generally been based either on sentence length or word correspondences. Sentence-length-based methods are relatively fast and fairly accurate. Word-correspondence-based methods are generally more accurate but much slower, and usually depend on cognates or a bilingual lexicon. Our method adapts and combines these approaches, achieving high accuracy at a modest computational cost, and requiring no knowledge of the languages or the corpus beyond division into words and sentences.