Word association norms, mutual information, and lexicography
Computational Linguistics
Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Query Expansion with Long-Span Collocates
Information Retrieval
From N-grams to collocations: an evaluation of Xtract
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Automatically extracting and representing collocations for language generation
ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
A nonparametric method for extraction of candidate phrasal terms
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bidirectional inference with the easiest-first strategy for tagging sequence data
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Paradigmatic modifiability statistics for the extraction of complex multi-word terms
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Combining association measures for collocation extraction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
TermeX: A Tool for Collocation Extraction
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Term extraction from sparse, ungrammatical domain-specific documents
Expert Systems with Applications: An International Journal
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Estimation of a Priori Decision Threshold for Collocations Extraction: An Empirical Study
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
Collocations are linguistic phenomena that occur when two or more words appear together more often than by chance and whose meaning often cannot be inferred from the meanings of its parts. As collocations have found many applications in the fields of natural language processing, information retrieval, and text mining, extracting them from large corpora has been the focus of many studies over the past few years. In this paper, we introduce the notion of an extension pattern, a formalization of the idea of extending lexical association measures (AMs) defined for bigrams. An extension pattern provides a measure-independent way of extending AMs for extracting collocations of arbitrary length. We define different extension patterns and compare them on a task of extracting collocations from a newspaper corpus. We show that the stopword-sensitive extension patterns we propose outperform other extensions, which indicates that AMs could benefit by taking into account linguistic information about an n-gram's part-of-speech pattern.