A statistical approach to machine translation
Computational Linguistics
Dynamic Programming
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
A statistical approach to language translation
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
The decomposition of human-written summary sentences
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine translation and monolingual information retrieval (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition
Machine Translation
Bilingual Sentence Alignment: Balancing Robustness and Accuracy
Machine Translation
The Origins of the Translator‘s Workstation
Machine Translation
Alignment and Matching of Bilingual English–Chinese News Texts
Machine Translation
Using hidden Markov modeling to decompose human-written summaries
Computational Linguistics - Summarization
Bilingual Dictionary Based Sentence Alignment for Chinese English Bitext
ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity
IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Knowledge Extraction from Bilingual Corpora
Information Extraction: Towards Scalable, Adaptable Systems
Building Parallel Corpora by Automatic Title Alignment
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
A Multilingual Procedure for Dictionary-Based Sentence Alignment
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Ordering Translation Templates by Assigning Confidence Factors
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Self-Learning Method of Parallel Texts Alignment
AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Adaptive Bilingual Sentence Alignment
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Automatic construction of English/Chinese parallel corpora
Journal of the American Society for Information Science and Technology
Using cognates to align sentences in bilingual corpora
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Translation analysis and translation automation
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A class-based approach to word alignment
Computational Linguistics
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
Bitext maps and alignment via pattern recognition
Computational Linguistics
Automatic construction of parallel English-Chinese corpus for cross-language information retrieval
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Word-for-word glossing with contextually similar words
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Example retrieval from a translation memory
Natural Language Engineering
High-performance bilingual text alignment using statistical and dictionary information
Natural Language Engineering
Tagging and alignment of parallel texts: current status of BCP
ANLC '92 Proceedings of the third conference on Applied natural language processing
Automating the acquisition of bilingual terminology
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Text alignment in a tool for translating revised documents
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Bitext correspondences through rich mark-up
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An experiment in hybrid dictionary and statistical sentence alignment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Methods and practical issues in evaluating alignment techniques
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Structural matching of parallel texts
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic alignment in parallel corpora
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Bilingual text, matching using bilingual dictionary and statistics
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Building an MT dictionary from parallel texts based on linguistic and statistical information
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A matching technique in Example-Based Machine Translation
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A part-of-speech-based alignment algorithm
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Aligning sentences in bilingual texts: French-English and French-Arabic
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Learning translation templates from bilingual text
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Extracting word correspondences from bilingual corpora based on word co-occurrences information
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Acquisition of phrase-level bilingual correspondence using dependency structure
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Should we translate the documents or the queries in cross-language information retrieval?
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Mixed language query disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Building parallel corpora by automatic title alignment using length-based and text-based approaches
Information Processing and Management: an International Journal
Creating a multilingual collocation dictionary from large text corpora
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Translation Disambiguation in Mixed Language Queries
Machine Translation
A robust cross-style bilingual sentences alignment model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using confidence bands for parallel texts alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
PENS: a machine-aided english writing system for Chinese users
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Dividing and conquering long sentences in a translation system
HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for use in bilingual lexicons
HLT '94 Proceedings of the workshop on Human Language Technology
Constructing of a large-scale Chinese-English parallel corpus
COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Efficient optimization for bilingual sentence alignment based on linear regression
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Construction and analysis of Japanese-English broadcast news corpus with named entity tags
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Conceptual analysis of parallel corpus collected from the Web
Journal of the American Society for Information Science and Technology
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
SlideSeer: a digital library of aligned document and presentation pairs
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Sentence alignment using P-NNT and GMM
Computer Speech and Language
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
ATLAS: a new text alignment architecture
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Chinese Ancient-Modern Sentence Alignment
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Cross Sentence Alignment for Structurally Dissimilar Corpus Based on Singular Value Decomposition
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Automatic extraction of translations from web-based bilingual materials
Machine Translation
WSEAS Transactions on Computers
Constructing Parallel Corpus from Movie Subtitles
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Tagging Sentence Boundaries in Biomedical Literature
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Improved sentence alignment on parallel web pages using a stochastic tree alignment model
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining a comparable text corpus for a Vietnamese - French statistical machine translation system
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A hybrid approach to align sentences and words in English-Hindi parallel corpora
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Comparison, selection and use of sentence alignment algorithms for new language pairs
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Arabic to French sentence alignment: exploration of a cross-language information retrieval approach
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Chinese-Uyghur sentence alignment: an approach based on anchor sentences
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Aligning portuguese and chinese parallel texts using confidence bands
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Selecting target word using contexonym comparison method
Proceedings of the 2007 conference on Human interface: Part I
Local context selection for aligning sentences in parallel corpora
CONTEXT'07 Proceedings of the 6th international and interdisciplinary conference on Modeling and using context
Context-based sentence alignment in parallel corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Text-based English-Arabic sentence alignment
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Fast-Champollion: a fast and robust sentence alignment algorithm
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Development of Hindi-Punjabi parallel corpus using existing Hindi-Punjabi machine translation system
Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia
Explicit length modelling for statistical machine translation
IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
An Expectation Maximization algorithm for textual unit alignment
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Building a web-based parallel corpus and filtering out machine-translated text
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Alignment of paragraphs in bilingual texts using bilingual dictionaries and dynamic programming
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A bilingual corpus of novels aligned at paragraph level
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Evaluation of alignment methods for HTML parallel text
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Mining bilingual lexical equivalences out of parallel corpora
SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
Approximate phrase match to compile synonymous translation terms for korean medical indexing
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Bilingual sentence alignment based on punctuation statistics and lexicon
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Probabilistic neural network based english-arabic sentence alignment
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Chinese-Japanese clause alignment
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Combining sentence length with location information to align monolingual parallel texts
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Using natural alignment to extract translation equivalents
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Extracting parallel paragraphs and sentences from english-persian translated documents
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Explicit length modelling for statistical machine translation
Pattern Recognition
Generalized biwords for bitext compression and translation spotting
Journal of Artificial Intelligence Research
Computer Speech and Language
Hi-index | 0.00 |
In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our data, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of the sentence, the alignment computation is fast and therefore practical for application to very large collections of text. We have used this technique to align several million sentences in the English-French Hansard corpora and have achieved an accuracy in excess of 99% in a random selected set of 1000 sentence pairs that we checked by hand. We show that even without the benefit of anchor points the correlation between the lengths of aligned sentences is strong enough that we should expect to achieve an accuracy of between 96% and 97%. Thus, the technique may be applicable to a wider variety of texts than we have yet tried.