Text-translation alignment

Authors:
Martin Kay;Martin Röscheisen
Affiliations:
Xerox Palo Alto Research Center and Stanford University;Xerox Palo Alto Research Center and Technical University of Munich
Venue:
Computational Linguistics - Special issue on using large corpora: I
Year:
1993

Citing 8
Cited 97

Word association norms, mutual information, and lexicography

Computational Linguistics
A statistical approach to machine translation

Computational Linguistics
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Information Retrieval

Information Retrieval
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A stochastic process for word frequency distributions

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Toward memory-based translation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition

Machine Translation
Bilingual Sentence Alignment: Balancing Robustness and Accuracy

Machine Translation
Line ‘Em Up: Advances in Alignment Technology and their Impact on Translation Support Tools

Machine Translation
The Origins of the Translator‘s Workstation

Machine Translation
Alignment and Matching of Bilingual English–Chinese News Texts

Machine Translation
Semantic Inference for Anaphora Resolution: Toward a Framework in Machine Translation

Machine Translation
Automatic Extraction of Rules for AnaphoraResolution of Japanese Zero Pronouns in Japanese–English Machine Translation from Aligned Sentence Pairs

Machine Translation
Statistical Translation of Text and Speech: First Results with the RWTH System

Machine Translation
Bilingual Dictionary Based Sentence Alignment for Chinese English Bitext

ICMI '00 Proceedings of the Third International Conference on Advances in Multimodal Interfaces
Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity

IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Knowledge Extraction from Bilingual Corpora

Information Extraction: Towards Scalable, Adaptable Systems
A Multilingual Procedure for Dictionary-Based Sentence Alignment

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Taxonomy and Lexical Semantics - From the Perspective of Machine Readable Dictionaries

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Self-Learning Method of Parallel Texts Alignment

AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Adaptive Bilingual Sentence Alignment

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Fast and Accurate Sentence Alignment of Bilingual Corpora

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
wEBMT: developing and validating an example-based machine translation system using the world wide web

Computational Linguistics - Special issue on web as corpus
Adaptive multilingual sentence boundary disambiguation

Computational Linguistics
A class-based approach to word alignment

Computational Linguistics
Bitext maps and alignment via pattern recognition

Computational Linguistics
Automatic construction of parallel English-Chinese corpus for cross-language information retrieval

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Unsupervised discovery of scenario-level patterns for Information Extraction

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Adaptive sentence boundary disambiguation

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Example retrieval from a translation memory

Natural Language Engineering
High-performance bilingual text alignment using statistical and dictionary information

Natural Language Engineering
A DP based search using monotone alignments in statistical translation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Bitext correspondences through rich mark-up

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A DP based search algorithm for statistical machine translation

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Maximum entropy model learning of the translation rules

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An experiment in hybrid dictionary and statistical sentence alignment

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Flow network models for word alignment and terminology extraction from bilingual corpora

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Methods and practical issues in evaluating alignment techniques

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Structural feature selection for English-Korean statistical machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Bilingual text, matching using bilingual dictionary and statistics

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
K-vec: a new approach for aligning parallel texts

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Extracting word correspondences from bilingual corpora based on word co-occurrences information

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Aligning more words with high precision for small bilingual corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
HMM-based word alignment in statistical translation

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Corpus-based annotated test set for machine translation evaluation by an industrial user

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A cheap and fast way to build useful translation lexicons

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Extracting word sequence correspondences with support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese-Korean word alignment based on linguistic comparison

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Using confidence bands for parallel texts alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Word alignment of English-Chinese bilingual corpus based on chunks

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Constructing of a large-scale Chinese-English parallel corpus

COLING '02 Proceedings of the 3rd workshop on Asian language resources and international standardization - Volume 12
Alignment and extraction of bilingual legal terminology from context profiles

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Construction and analysis of Japanese-English broadcast news corpus with named entity tags

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Stemming to improve translation lexicon creation form bitexts

Information Processing and Management: an International Journal
Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A DOM tree alignment model for mining parallel data from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Robust sub-sentential alignment of phrase-structure trees

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Multilingual lexical database generation from parallel texts in 20 European languages with endogenous resources

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
ATLAS: a new text alignment architecture

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Chinese Ancient-Modern Sentence Alignment

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Cross Sentence Alignment for Structurally Dissimilar Corpus Based on Singular Value Decomposition

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Automatic extraction of translations from web-based bilingual materials

Machine Translation
Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Acta Cybernetica
Improving the extraction of bilingual terminology from Wikipedia

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Linguistic preprocessing for distributional classification of words

ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
Unsupervised named entity transliteration using temporal and phonetic correlation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Improved sentence alignment on parallel web pages using a stochastic tree alignment model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

Artificial Intelligence in Medicine
A hybrid approach to align sentences and words in English-Hindi parallel corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Comparison, selection and use of sentence alignment algorithms for new language pairs

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Arabic to French sentence alignment: exploration of a cross-language information retrieval approach

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Chinese-Uyghur sentence alignment: an approach based on anchor sentences

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Bilingual dictionary generation for low-resourced language pairs

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Aligning portuguese and chinese parallel texts using confidence bands

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Language-independent context aware query translation using Wikipedia

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Alignment of paragraphs in bilingual texts using bilingual dictionaries and dynamic programming

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
A bilingual corpus of novels aligned at paragraph level

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Maximum likelihood alignment of translation equivalents

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
New approach for collecting high quality parallel corpora from multilingual websites

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Acquiring bilingual named entity translations from content-aligned corpora

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Bilingual sentence alignment based on punctuation statistics and lexicon

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Chinese-Japanese clause alignment

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Paragraph-Level alignment of an english-spanish parallel corpus of fiction texts using bilingual dictionaries

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Aligning transcripts to automatically segmented handwritten manuscripts

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Extracting parallel paragraphs and sentences from english-persian translated documents

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Book review: bitext alignment jörg tiedemann (uppsala university) morgan & claypool (synthesis lectures on human language technologies, edited by graeme hirst, volume 14), 2011, 153 pp; paperbound, isbn 978-1-60845-510-2, $45.00; e-book, isbn 978-1-60815-511-9, $30.00 or by subscription

Computational Linguistics
Application of clause alignment for statistical machine translation

SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a partial alignment of the word level to induce a maximum likelihood alignment of the sentence level, which is in turn used, in the next iteration, to refine the word level estimate. The algorithm appears to converge to the correct sentence alignment in only a few iterations.