Experiments in automatic statistical thesaurus construction
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Mining the web for answers to natural language questions
Proceedings of the tenth international conference on Information and knowledge management
Explorations in Automatic Thesaurus Discovery
Explorations in Automatic Thesaurus Discovery
Estimating Word Translation Probabilities from Unrelated Monolingual Corpora Using the EM Algorithm
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Generation that exploits corpus-based statistical knowledge
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
"Dialog Navigator": a question answering system based on large text knowledge base
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Extracting paraphrases from a parallel corpus
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Collocation translation acquisition using monolingual corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A bootstrapping approach to unsupervised detection of cue phrase variants
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Term aggregation: mining synonymous expressions using personal stylistic variations
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Learning question paraphrases for QA from Encarta logs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Acquiring translation equivalences of multiword expressions by normalized correlation frequencies
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
PEM: a paraphrase evaluation metric exploiting parallel texts
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Generating phrasal and sentential paraphrases: A survey of data-driven methods
Computational Linguistics
Two-Word Collocation Extraction Using Monolingual Word Alignment Method
ACM Transactions on Intelligent Systems and Technology (TIST)
Correcting semantic collocation errors with L1-induced paraphrases
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Ensemble semantics for large-scale unsupervised relation extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. In this paper, we propose a method that uses both these resources to get an optimal compromise of precision and coverage. This method first gets candidates of synonymous collocation pairs based on a monolingual corpus and a word thesaurus, and then selects the appropriate pairs from the candidates using their translations in a second language. The translations of the candidates are obtained with a statistical translation model which is trained with a small bilingual corpus and a large monolingual corpus. The translation information is proved as effective to select synonymous collocation pairs. Experimental results indicate that the average precision and recall of our approach are 74% and 64% respectively, which outperform those methods that only use monolingual corpora and those that only use bilingual corpora.