Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Building probabilistic models for natural language
Building probabilistic models for natural language
A fast algorithm for computing longest common subsequences
Communications of the ACM
Dynamic Programming
Empirical methods for exploiting parallel texts
Empirical methods for exploiting parallel texts
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Reading more into foreign languages
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Semi-automatic acquisition of domain-specific translation lexicons
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Automatic alignment in parallel corpora
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Automatic detection of omissions in translations
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Extracting Equivalents from Aligned Parallel Texts: Comparison of Measures of Similarity
IBERAMIA-SBIA '00 Proceedings of the International Joint Conference, 7th Ibero-American Conference on AI: Advances in Artificial Intelligence
Empirical Methods for MT Lexicon Development
AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A Self-Learning Method of Parallel Texts Alignment
AMTA '00 Proceedings of the 4th Conference of the Association for Machine Translation in the Americas on Envisioning Machine Translation in the Information Future
Adaptive Bilingual Sentence Alignment
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Determining recurrent sound correspondences by inducing translation models
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Towards a unified approach to memory- and statistical-based machine translation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Identifying cognates by phonetic and semantic similarity
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
A web-trained extraction summarization system
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Using confidence bands for parallel texts alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Computational Linguistics
Sentence alignment for monolingual comparable corpora
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
An automatic filter for non-parallel texts
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Identification of confusable drug names: a new approach and evaluation methodology
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Sentence alignment using P-NNT and GMM
Computer Speech and Language
Application of a word-alignment algorithm to bilingual Greek-Latin documents
ACS'07 Proceedings of the 7th Conference on 7th WSEAS International Conference on Applied Computer Science - Volume 7
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
Applications of corpus-based semantic similarity and word segmentation to database schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Methods for extracting and classifying pairs of cognates and false friends
Machine Translation
English-Arabic proper-noun transliteration-pairs creation
Journal of the American Society for Information Science and Technology
Approximate String Matching Techniques for Effective CLIR Among Indian Languages
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Automatic extraction of translations from web-based bilingual materials
Machine Translation
Automatic prediction of cognate orthography using support vector machines
ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
A discriminative candidate generator for string transformations
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
On the complexity of alignment problems in two synchronous grammar formalisms
SSST '09 Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Automatic identification of confusable drug names
Artificial Intelligence in Medicine
SMS based interface for FAQ retrieval
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Chinese-Uyghur sentence alignment: an approach based on anchor sentences
BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Unsupervised tokenization for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Real-word spelling correction using Google Web IT 3-grams
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Computing word similarity and identifying cognates with pair hidden Markov models
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Aligning portuguese and chinese parallel texts using confidence bands
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
A knowledge-rich approach to measuring the similarity between Bulgarian and Russian words
MRTECEEL '09 Proceedings of the Workshop on Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages
LetsMT! --Online Platform for Sharing Training Data and Building User Tailored Machine Translation
Proceedings of the 2010 conference on Human Language Technologies -- The Baltic Perspective: Proceedings of the Fourth International Conference Baltic HLT 2010
Handling noisy queries in cross language FAQ retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
A survey of paraphrasing and textual entailment methods
Journal of Artificial Intelligence Research
Unsupervised cleansing of noisy text
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Matching samples of multiple views
Data Mining and Knowledge Discovery
Hybrid data mining approaches for prevention of drug dispensing errors
Journal of Intelligent Information Systems
Measuring spelling similarity for cognate identification
EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Using natural alignment to extract translation equivalents
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
N-gram similarity and distance
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Enabling users to create their own web-based machine translation engine
Proceedings of the 21st international conference companion on World Wide Web
Journal of Artificial Intelligence Research
Design of a hybrid high quality machine translation system
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Hi-index | 0.00 |
Texts that are available in two languages (bitexts) are becoming more and more plentiful, both in private data warehouses and on publicly accessible sites on the World Wide Web. As with other kinds of data, the value of bitexts largely depends on the efficacy of the available data mining tools. The first step in extracting useful information from bitexts is to find corresponding words and/or text segment boundaries in their two halves (bitext maps).This article advances the state of the art of bitext mapping by formulating the problem in terms of pattern recognition. From this point of view, the success of a bitext mapping algorithm hinges on how well it performs three tasks: signal generation, noise filtering, and search. The Smooth Injective Map Recognizer (SIMR) algorithm presented here integrates innovative approaches to each of these tasks. Objective evaluation has shown that SIMR's accuracy is consistently high for language pairs as diverse as French/English and Korean/English. If necessary, SIMR's bitext maps can be efficiently converted into segment alignments using the Geometric Segment Alignment (GSA) algorithm, which is also presented here.SIMR has produced bitext maps for over 200 megabytes of French-English bitexts. GSA has converted these maps into alignments. Both the maps and the alignments are available from the Linguistic Data Consortium.