Translating collocations for use in bilingual lexicons

Authors:
Frank Smadja;Kathleen McKeown
Affiliations:
Columbia University, New York, NY;Columbia University, New York, NY
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 13
Cited 5

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
Two languages are more informative than one

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Word-sense disambiguation using statistical methods

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Automatically extracting and representing collocations for language generation

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics

A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
High-performance bilingual text alignment using statistical and dictionary information

Natural Language Engineering
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Exploiting translational correspondences for pattern-independent MWE identification

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Clustering for data matching

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collocations are notoriously difficult for non-native speakers to translate, primarily because they are opaque and can not be translated on a word by word basis. We describe a program named Champollion which, given a pair of parallel corpora in two different languages, automatically produces translations of an input list of collocations. Our goal is to provide a tool to compile bilingual lexical information above the word level in multiple languages and domains. The algorithm we use is based on statistical methods and produces p word translations of n word collocations in which n and p need not be the same; the collocations can be either flexible or fixed compounds. For example, Champollion translates "to make a decision," "employment equity," and "stock market," respectively into: "prendre une decision," "équité en matière d'emploi," and "bourse." Testing and evaluation of Champollion on one year's worth of the Hansards corpus yielded 300 collocations and their translations, evaluated at 77% accuracy. In this paper, we describe the statistical measures used, the algorithm, and the implementation of Champollion, presenting our results and evaluation.