Synonymous collocation extraction using translation information

Authors:
Hua Wu;Ming Zhou
Affiliations:
Microsoft Research Asia, Haidian District, Beijing, China;Microsoft Research Asia, Haidian District, Beijing, China
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Year:
2003

Citing 10
Cited 10

Experiments in automatic statistical thesaurus construction

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Mining the web for answers to natural language questions

Proceedings of the tenth international conference on Information and knowledge management
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups

Machine Translation
Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
"Dialog Navigator": a question answering system based on large text knowledge base

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Collocation translation acquisition using monolingual corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A bootstrapping approach to unsupervised detection of cue phrase variants

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Term aggregation: mining synonymous expressions using personal stylistic variations

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning question paraphrases for QA from Encarta logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Acquiring translation equivalences of multiword expressions by normalized correlation frequencies

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
PEM: a paraphrase evaluation metric exploiting parallel texts

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Generating phrasal and sentential paraphrases: A survey of data-driven methods

Computational Linguistics
Two-Word Collocation Extraction Using Monolingual Word Alignment Method

ACM Transactions on Intelligent Systems and Technology (TIST)
Correcting semantic collocation errors with L1-induced paraphrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Ensemble semantics for large-scale unsupervised relation extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. In this paper, we propose a method that uses both these resources to get an optimal compromise of precision and coverage. This method first gets candidates of synonymous collocation pairs based on a monolingual corpus and a word thesaurus, and then selects the appropriate pairs from the candidates using their translations in a second language. The translations of the candidates are obtained with a statistical translation model which is trained with a small bilingual corpus and a large monolingual corpus. The translation information is proved as effective to select synonymous collocation pairs. Experimental results indicate that the average precision and recall of our approach are 74% and 64% respectively, which outperform those methods that only use monolingual corpora and those that only use bilingual corpora.