Word association norms, mutual information, and lexicography
Computational Linguistics
Querying across languages: a dictionary-based approach to multilingual information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Introduction to the special issue on computational linguistics using large corpora
Computational Linguistics - Special issue on using large corpora: I
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A "not-so-shallow" parser for collocational analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Using collocations for topic segmentation and link detection
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Synonymous collocation extraction using translation information
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Accurate collocation extraction using a multilingual parser
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Discriminative pruning of language models for Chinese word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Significance tests for the evaluation of ranking methods
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Collocation extraction based on modifiability statistics
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Combining association measures for collocation extraction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Multi-word expression identification using sentence surface features
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A hybrid approach for multiword expression identification
PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Hi-index | 0.00 |
Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact that the words in a collocation tend to co-occur in similar contexts as in bilingual word alignment. First, the monolingual corpus is replicated to generate a parallel corpus, in which each sentence pair consists of two identical sentences. Next, the monolingual word alignment algorithm is employed to align potentially collocated words. Finally, the aligned word pairs are ranked according to the alignment scores and candidates with higher scores are extracted as collocations. We conducted experiments on Chinese and English corpora respectively. Compared to previous approaches that use association measures to extract collocations from co-occurrence word pairs within a given window, our method achieves higher precision and recall. According to human evaluation, our method achieves precisions of 62% on a Chinese corpus and 64% on an English corpus. In particular, we can extract collocations with longer spans, achieving a higher precision of 83% on the long-span ( 6 words) Chinese collocations.