A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

Authors:
Pascale Fung
Affiliations:
Columbia University, New York, NY
Venue:
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Year:
1995

Citing 10
Cited 43

A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
K-vec: a new approach for aligning parallel texts

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Building an MT dictionary from parallel texts based on linguistic and statistical information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Translating collocations for use in bilingual lexicons

HLT '94 Proceedings of the workshop on Human Language Technology

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition

Machine Translation
Unit Completion for a Computer-aided Translation Typing System

Machine Translation
Building Parallel Corpora by Automatic Title Alignment

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Models of translational equivalence among words

Computational Linguistics
Bitext maps and alignment via pattern recognition

Computational Linguistics
Unit completion for a computer-aided translation typing system

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
High-performance bilingual text alignment using statistical and dictionary information

Natural Language Engineering
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
High-performance bilingual text alignment using statistical and dictionary information

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Automatic thesaurus generation through multiple filtering

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extracting word correspondences from bilingual corpora based on word co-occurrences information

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Learning bilingual collocations by word-level sorting

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Extraction of lexical translations from non-aligned corpora

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Building parallel corpora by automatic title alignment using length-based and text-based approaches

Information Processing and Management: an International Journal
Measuring the similarity between compound nouns in different languages using non-parallel corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Chinese-Korean word alignment based on linguistic comparison

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Mining comparable bilingual text corpora for cross-language information integration

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Towards a simple and accurate statistical approach to learning translation relationships among words

DMMT '01 Proceedings of the workshop on Data-driven methods in machine translation - Volume 14
Noun-noun compound machine translation: a feasibility study on shallow processing

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Automatic generation of Japanese–English bilingual thesauri based on bilingual corpora

Journal of the American Society for Information Science and Technology - Research Articles
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bilingual-dictionary adaptation to domains

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Detecting time series motifs under uniform scaling

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation

Machine Translation
English-Arabic proper-noun transliteration-pairs creation

Journal of the American Society for Information Science and Technology
On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation

SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised named entity transliteration using temporal and phonetic correlation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Web-Based Transliteration of Person Names

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Compiling a massive, multilingual dictionary via probabilistic inference

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Selecting target word using contexonym comparison method

Proceedings of the 2007 conference on Human interface: Part I
Panlingual lexical translation via probabilistic inference

Artificial Intelligence
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Revisiting context-based projection methods for term-translation spotting in comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Transliteration equivalence using canonical correlation analysis

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence
Mining a Persian-English comparable corpus for cross-language information retrieval

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.