Automatic identification of word translations from unrelated English and German corpora

Authors:
Reinhard Rapp
Affiliations:
University of Mainz, Germersheim, Germany
Venue:
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Year:
1999

Citing 15
Cited 126

Pictures of relevance: a geometric analysis of similarity measures

Journal of the American Society for Information Science
A statistical approach to machine translation

Computational Linguistics
Human memory models and term association

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
A freely available morphological analyzer, disambiguator and context sensitive lemmatizer for German

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Methods and practical issues in evaluating alignment techniques

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Part-of-speech induction from scratch

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Identifying word translations in non-parallel texts

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics

Translation of web queries using anchor text mining

ACM Transactions on Asian Language Information Processing (TALIP)
Unit Completion for a Computer-aided Translation Typing System

Machine Translation
Enhancing cross-language information retrieval by an automatic acquisition of bilingual terminology from comparable corpora

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Translating unknown cross-lingual queries in digital libraries using a web-based approach

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Translating unknown queries with web corpora for cross-language information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning phonetic similarity for matching named entity translations and mining new translations

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Effect of cross-language IR in bilingual lexicon acquisition from comparable corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Looking for candidate translational equivalents in specialized, comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
The computation of word associations: comparing syntagmatic and paradigmatic approaches

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A transitive model for extracting translation equivalents of web queries through anchor text mining

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Base Noun Phrase translation using web data and the EM algorithm

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Measuring the similarity between compound nouns in different languages using non-parallel corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Bilingual terminology acquisition from comparable corpora and phrasal translation to cross-language information retrieval

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
Bootstrapping dictionaries for cross-language information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Learning a translation lexicon from monolingual corpora

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
Processing comparable corpora with Bilingual Suffix Trees

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Inducing translation lexicons via diverse similarity measures and bridge languages

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Noun-noun compound machine translation: a feasibility study on shallow processing

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Exploiting the Web as the multilingual corpus for unknown query translation

Journal of the American Society for Information Science and Technology
Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
Collocation translation acquisition using monolingual corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Creating multilingual translation lexicons with regional variations using web corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Concept unification of terms in different languages for IR

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Leveraging reusability: cost-effective lexical acquisition for large-scale ontology translation

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning transliteration lexicons from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Mining new word translations from comparable corpora

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Bilingual-dictionary adaptation to domains

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Cognate mapping: a heuristic strategy for the semi-supervised acquisition of a Spanish lexicon from a Portuguese seed lexicon

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Integrating cross-lingually relevant news articles and monolingual web documents in bilingual lexicon acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Corpus-based cross-language information retrieval in retrieval of highly relevant documents: Research Articles

Journal of the American Society for Information Science and Technology
Chinese-English term translation mining based on semantic prediction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Active learning for constructing transliteration lexicons from the Web

Journal of the American Society for Information Science and Technology
Finding translations for low-frequency words in comparable corpora

Machine Translation
Methods for extracting and classifying pairs of cognates and false friends

Machine Translation
Comparing Window and Syntax Based Strategies for Semantic Extraction

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Constructing Parallel Corpus from Movie Subtitles

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Adaptive string distance measures for bilingual dialect lexicon induction

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Multilingual Evidence Improves Clustering-based Taxonomy Extraction

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Improving the extraction of bilingual terminology from Wikipedia

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
The computation of associative responses to multiword stimuli

COGALEX '08 Proceedings of the workshop on Cognitive Aspects of the Lexicon
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Language and translation model adaptation using comparable corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Named entity translation with web mining and transliteration

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
The University of Maryland statistical machine translation system for the Fourth Workshop on Machine Translation

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the web

WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Automatic acquisition of bilingual rules for extraction of bilingual word pairs from parallel corpora

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Utilizing contextually relevant terms in bilingual lexicon extraction

UMSLLS '09 Proceedings of the Workshop on Unsupervised and Minimally Supervised Learning of Lexical Semantics
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

Artificial Intelligence in Medicine
QRpotato: a system that exhaustively collects bilingual technical term pairs from the web

Proceedings of the 3rd International Universal Communication Symposium
Compilation of specialized comparable corpora in French and Japanese

BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
Comparing Different Properties Involved in Word Similarity Extraction

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Improved statistical machine translation using monolingually-derived paraphrases

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Word space models of lexical variation

GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
A graph-theoretic algorithm for automatic extension of translation lexicons

GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Specific circumstances on the ability of linguistic feature extraction based on context preprocessing by ICA

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
QRselect: a user-driven system for collecting translation document pairs from the web

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionary

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Brains, not brawn: The use of “smart” comparable corpora in bilingual terminology mining

ACM Transactions on Speech and Language Processing (TSLP)
Extracting parallel sentences from comparable corpora using document level alignment

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross-lingual induction of selectional preferences with bilingual vector spaces

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Bilingual lexicon generation using non-aligned signatures

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bilingual sense similarity for statistical machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A statistical model for lost language decipherment

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Sentiment translation through lexicon induction

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Using Mechanical Turk to annotate lexicons for less commonly used languages

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Translingual document representations from discriminative projections

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Extracting parallel fragments from comparable corpora for data-to-text generation

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Robust measurement and comparison of context similarity for finding translation pairs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploiting comparable corpora for cross-language information retrieval

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Using comparable corpora to improve the effectiveness of cross-language information retrieval

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
The automatic identification of lexical variation between language varieties

Natural Language Engineering
Bilingual lexicon extraction from comparable corpora using in-domain terms

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A linguistically grounded graph model for bilingual lexicon extraction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A multi-view approach for term translation spotting

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Effective use of dependency structure for bilingual lexicon creation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Measuring Chinese-English cross-lingual word similarity with HowNet and parallel corpus

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Conducting term alignment of a dataset without data provider identification

ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Is singular value decomposition useful for word similarity extraction?

Language Resources and Evaluation
From bilingual dictionaries to interlingual document representations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Clustering comparable corpora for bilingual lexicon extraction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Learning the optimal use of dependency-parsing information for finding translations with comparable corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Building and using comparable corpora for domain-specific bilingual lexicon extraction

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bilingual lexicon extraction from comparable corpora enhanced with parallel corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bilingual lexicon extraction from comparable corpora as metasearch

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Unsupervised alignment of comparable data and text resources

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Cross-lingual slot filling from comparable corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Bootstrapping bilingual lexicons from comparable corpora for closely related languages

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)
Using Web-Mining for Academic Measurement and Scholar Recommendation in Expert Finding System

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mining entity translations from comparable corpora: a holistic graph mapping approach

Proceedings of the 20th ACM international conference on Information and knowledge management
Unsupervised multilingual learning

Unsupervised multilingual learning
An improved method for finding bilingual collocation correspondences from monolingual corpora

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
An approach to acquire word translations from non-parallel texts

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Transliteration equivalence using canonical correlation analysis

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Unsupervised language-independent name translation mining from Wikipedia infoboxes

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Analyzing methods for improving precision of pivot based bilingual dictionaries

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Improving bilingual projections via sparse covariance matrices

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
French-english terminology extraction from comparable corpora

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Web-based terminology translation mining

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Automatic identification of parallel documents with light or without linguistic resources

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension

ACM Transactions on Asian Language Information Processing (TALIP)
QAlign: a new method for bilingual lexicon extraction from comparable corpora

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Extraction of bilingual cognates from wikipedia

PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Learning bilingual lexicons using the visual similarity of labeled web images

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Detecting difference of usage of terms as difference of structure

Cognitive Systems Research
Toward statistical machine translation without parallel corpora

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Detecting highly confident word translations from comparable corpora without any prior knowledge

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Measuring comparability of documents in non-parallel corpora for efficient extraction of (semi-)parallel translation equivalents

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Design of a hybrid high quality machine translation system

EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
Rediscovering ACL discoveries through the lens of ACL anthology network citing sentences

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Bilingual lexicon extraction from comparable corpora using label propagation

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Managing information disparity in multilingual document collections

ACM Transactions on Speech and Language Processing (TSLP)
Distributional phrasal paraphrase generation for statistical machine translation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult, because most statistical clues useful in the processing of parallel texts cannot be applied to non-parallel texts. Whereas for parallel texts in some studies up to 99% of the word alignments have been shown to be correct, the accuracy for non-parallel texts has been around 30% up to now. The current study, which is based on the assumption that there is a correlation between the patterns of word co-occurrences in corpora of different languages, makes a significant improvement to about 72% of word translations identified correctly.