Aligning sentences in bilingual corpora using lexical information

Authors:
Stanley F. Chen
Affiliations:
Harvard University, Cambridge, MA
Venue:
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Year:
1993

Citing 8
Cited 59

A statistical approach to machine translation

Computational Linguistics
Dynamic Programming

Dynamic Programming
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
The BICORD system: combining lexical information from bilingual corpora and machine readable dictionaries

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Bilingual Sentence Alignment: Balancing Robustness and Accuracy

Machine Translation
Bilingual Dictionary Based Sentence Alignment for Chinese English Bitext

ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
Knowledge Extraction from Bilingual Corpora

Information Extraction: Towards Scalable, Adaptable Systems
A Multilingual Procedure for Dictionary-Based Sentence Alignment

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Adaptive Bilingual Sentence Alignment

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Fast and Accurate Sentence Alignment of Bilingual Corpora

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A class-based approach to word alignment

Computational Linguistics
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora

Computational Linguistics
Automatic construction of parallel English-Chinese corpus for cross-language information retrieval

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
High-performance bilingual text alignment using statistical and dictionary information

Natural Language Engineering
Semi-automatic acquisition of domain-specific translation lexicons

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Text alignment in the real world: improving alignments of noisy translations using common lexical features, string matching strategies and n-gram comparisons

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An experiment in hybrid dictionary and statistical sentence alignment

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Structural feature selection for English-Korean statistical machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Bilingual text, matching using bilingual dictionary and statistics

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
K-vec: a new approach for aligning parallel texts

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Building an MT dictionary from parallel texts based on linguistic and statistical information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
A part-of-speech-based alignment algorithm

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Extracting word correspondences from bilingual corpora based on word co-occurrences information

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Alignment of shared forests for bilingual corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A robust cross-style bilingual sentences alignment model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Multipath translation lexicon induction via bridge languages

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
PENS: a machine-aided english writing system for Chinese users

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Translating collocations for use in bilingual lexicons

HLT '94 Proceedings of the workshop on Human Language Technology
A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Constructing of a large-scale Chinese-English parallel corpus

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Efficient optimization for bilingual sentence alignment based on linear regression

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
A DOM tree alignment model for mining parallel data from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Sentence alignment using P-NNT and GMM

Computer Speech and Language
Cross Sentence Alignment for Structurally Dissimilar Corpus Based on Singular Value Decomposition

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Acta Cybernetica
Improved sentence alignment on parallel web pages using a stochastic tree alignment model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A hybrid approach to align sentences and words in English-Hindi parallel corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Comparison, selection and use of sentence alignment algorithms for new language pairs

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Chinese-Uyghur sentence alignment: an approach based on anchor sentences

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Local context selection for aligning sentences in parallel corpora

CONTEXT'07 Proceedings of the 6th international and interdisciplinary conference on Modeling and using context
Context-based sentence alignment in parallel corpora

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Fast-Champollion: a fast and robust sentence alignment algorithm

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
An Expectation Maximization algorithm for textual unit alignment

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Building a web-based parallel corpus and filtering out machine-translated text

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Alignment of paragraphs in bilingual texts using bilingual dictionaries and dynamic programming

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A bilingual corpus of novels aligned at paragraph level

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Evaluation of alignment methods for HTML parallel text

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Maximum likelihood alignment of translation equivalents

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Bilingual sentence alignment based on punctuation statistics and lexicon

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Paragraph-Level alignment of an english-spanish parallel corpus of fiction texts using bilingual dictionaries

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Combining sentence length with location information to align monolingual parallel texts

AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Extracting parallel paragraphs and sentences from english-persian translated documents

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language independent.