Collocation extraction using monolingual word alignment method

Authors:
Zhanyi Liu;Haifeng Wang;Hua Wu;Sheng Li
Affiliations:
Harbin Institute of Technology, Harbin, China and Toshiba (China) Research and Development Center, Beijing, China;Toshiba (China) Research and Development Center, Beijing, China;Toshiba (China) Research and Development Center, Beijing, China;Harbin Institute of Technology, Harbin, China
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 7
Cited 8

Word association norms, mutual information, and lexicography

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Accurate collocation extraction using a multilingual parser

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Collocation extraction based on modifiability statistics

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Improving Statistical Machine Translation with monolingual collocation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Book review:

Computational Linguistics
Reordering with source language collocations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic keyphrase extraction by bridging vocabulary gap

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Mining the interests of Chinese microbloggers via keyword extraction

Frontiers of Computer Science in China
A simple word trigger method for social tag suggestion

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Opinion target extraction using word-based translation model

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Content-Based Semantic Tag Ranking for Recommendation

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the monolingual word alignment algorithm is employed to align the potentially collocated words in the monolingual sentences. Finally the aligned word pairs are ranked according to refined alignment probabilities and those with higher scores are extracted as collocations. We conducted experiments using Chinese and English corpora individually. Compared with previous approaches, which use association measures to extract collocations from the co-occurring word pairs within a given window, our method achieves higher precision and recall. According to human evaluation in terms of precision, our method achieves absolute improvements of 27.9% on the Chinese corpus and 23.6% on the English corpus, respectively. Especially, we can extract collocations with longer spans, achieving a high precision of 69% on the long-span (6) Chinese collocations.