A word-to-word model of translational equivalence

Authors:
I. Dan Melamed
Affiliations:
University of Pennsylvania, Philadelphia, PA
Venue:
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Year:
1997

Citing 14
Cited 32

A statistical approach to machine translation

Computational Linguistics
Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Building probabilistic models for natural language

Building probabilistic models for natural language
A survey of multilingual text retrieval

A survey of multilingual text retrieval
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Semi-automatic acquisition of domain-specific translation lexicons

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Building an MT dictionary from parallel texts based on linguistic and statistical information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Automatic detection of omissions in translations

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
But dictionaries are data too

HLT '93 Proceedings of the workshop on Human Language Technology

Alignment and Matching of Bilingual English–Chinese News Texts

Machine Translation
Empirical Methods for MT Lexicon Development

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Lexicons as Gold: Mining, Embellishment and Reuse

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A simple hybrid aligner for generating lexical correspondences in parallel texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Structural feature selection for English-Korean statistical machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Extracting word sequence correspondences with support vector machines

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Using similarity scoring to improve the bilingual dictionary for word alignment

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Word alignment with cohesion constraint

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Alignment and extraction of bilingual legal terminology from context profiles

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Class Based Sense Definition Model for word sense tagging and disambiguation

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
TANGO: bilingual collocational concordancer

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Alignment model adaptation for domain-specific word alignment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Bilingual-dictionary adaptation to domains

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Bootstrapping without the boot

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Sub-sentential alignment using substring co-occurrence counts

COLING ACL '06 Proceedings of the 21st International Conference on computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Improving the extraction of bilingual terminology from Wikipedia

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
A new objective function for word alignment

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval

Artificial Intelligence in Medicine
Competitive grouping in integrated phrase segmentation and alignment model

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Constraining the phrase-based, joint probability statistical translation model

StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Compiling a massive, multilingual dictionary via probabilistic inference

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Panlingual lexical translation via probabilistic inference

Artificial Intelligence
Brains, not brawn: The use of “smart” comparable corpora in bilingual terminology mining

ACM Transactions on Speech and Language Processing (TSLP)
Improving corpus comparability for bilingual lexicon extraction from comparable corpora

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
The CMU-EBMT machine translation system

Machine Translation
An approach to acquire word translations from non-parallel texts

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Web-based unsupervised learning for query formulation in question answering

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Data driven approaches to speech and language processing

Nonlinear Speech Modeling and Applications
Using natural alignment to extract translation equivalents

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expenses of inducing or applying a full translation model. For theses applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level. The model's precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc., Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionary-sized translation lexicons, and it can do so with over 99% accuracy.