A statistical approach to machine translation
Computational Linguistics
Computational Linguistics - Special issue on using large corpora: I
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
Comparing cross-language query expansion techniques by degrading translation resources
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition
Machine Translation
Bilingual Sentence Alignment: Balancing Robustness and Accuracy
Machine Translation
The Origins of the Translator‘s Workstation
Machine Translation
Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity
IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Knowledge Extraction from Bilingual Corpora
Information Extraction: Towards Scalable, Adaptable Systems
Building Parallel Corpora by Automatic Title Alignment
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
ECDL '98 Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries
A Self-Learning Method of Parallel Texts Alignment
AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
An Internet Difference Engine and Its Applications
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Translation with Scarce Bilingual Resources
Machine Translation
Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
Computational Linguistics
Bitext maps and alignment via pattern recognition
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Termight: identifying and translating technical terminology
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
High-performance bilingual text alignment using statistical and dictionary information
Natural Language Engineering
Semi-automatic acquisition of domain-specific translation lexicons
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
A DP based search using monotone alignments in statistical translation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Bitext correspondences through rich mark-up
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An experiment in hybrid dictionary and statistical sentence alignment
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
An algorithm for simultaneously bracketing parallel texts by aligning words
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
An automatic method of finding topic boundaries
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic alignment in parallel corpora
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
High-performance bilingual text alignment using statistical and dictionary information
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Bilingual text, matching using bilingual dictionary and statistics
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic detection of omissions in translations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Building parallel corpora by automatic title alignment using length-based and text-based approaches
Information Processing and Management: an International Journal
A robust cross-style bilingual sentences alignment model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Identifying cognates by phonetic and semantic similarity
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A web-trained extraction summarization system
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Chinese-Korean word alignment based on linguistic comparison
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Using confidence bands for parallel texts alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
PENS: a machine-aided english writing system for Chinese users
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Translating collocations for use in bilingual lexicons
HLT '94 Proceedings of the workshop on Human Language Technology
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient optimization for bilingual sentence alignment based on linear regression
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Aligning word senses using bilingual corpora
ACM Transactions on Asian Language Information Processing (TALIP)
Minimum cut model for spoken lecture segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A DOM tree alignment model for mining parallel data from the web
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Journal of the American Society for Information Science and Technology
Translation corpus source and size in bilingual retrieval
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Aspect-based sentence segmentation for sentiment summarization
Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Comparison, selection and use of sentence alignment algorithms for new language pairs
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Aligning portuguese and chinese parallel texts using confidence bands
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Exploring new languages with HAIRCUT at CLEF 2005
CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Approximate phrase match to compile synonymous translation terms for korean medical indexing
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Combining sentence length with location information to align monolingual parallel texts
AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
Cross-language retrieval using HAIRCUT at CLEF 2004
CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images
Using natural alignment to extract translation equivalents
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Application of clause alignment for statistical machine translation
SSST-6 '12 Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Hi-index | 0.00 |
There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Rösenschein (to appear), Simard et al (1992), Warwick-Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.