Word association norms, mutual information, and lexicography
Computational Linguistics
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A vector space model for automatic indexing
Communications of the ACM
A memory-based approach to learning shallow natural language patterns
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Extracting noun phrases from large-scale texts: a hybrid approach and its automatic evaluation
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Surface grammatical analysis for the extraction of terminological noun phrases
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
Multiword unit hybrid extraction
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
A bio-inspired approach for multi-word expression extraction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Text classification based on multi-word with support vector machine
Knowledge-Based Systems
Text clustering using frequent itemsets
Knowledge-Based Systems
Measuring the non-compositionality of multiword expressions
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A new multiword expression metric and its applications
Journal of Computer Science and Technology - Special issue on natural language processing
An information theoretic sparse kernel algorithm for online learning
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
One of the deficiencies of mutual information is its poor capacity to measure association of words with unsymmetrical co-occurrence, which has large amounts for multi-word expression in texts. Moreover, threshold setting, which is decisive for success of practical implementation of mutual information for multi-word extraction, brings about many parameters to be predefined manually in the process of extracting multiword expressions with different number of individual words. In this paper, we propose a new method as EMICO (Enhanced Mutual Information and Collocation Optimization) to extract substantival multiword expression from text. Specifically, enhanced mutual information is proposed to measure the association of words and collocation optimization is proposed to automatically determine the number of individual words contained in a multiword expression when the multiword expression occurs in a candidate set. Our experiments showed that EMICO significantly improves the performance of substantival multiword expression extraction in comparison with a classic extraction method based on mutual information.