A re-examination of lexical association measures

Authors:
Hung Huu Hoang;Su Nam Kim;Min-Yen Kan
Affiliations:
National University of Singapore;University of Melbourne;National University of Singapore
Venue:
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Year:
2009

Citing 7
Cited 5

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Methods for the qualitative evaluation of lexical association measures

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Extracting the unextractable: a case study on verb-particles

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Combining association measures for collocation extraction

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Deep lexical acquisition of verb-particle constructions

Computer Speech and Language

A rapid method to extract multiword expressions with statistic measures and linguistic rules

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Improving bilingual projections via sparse covariance matrices

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised identification of persian compound verbs

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Sequence alignment with arbitrary steps and further generalizations, with applications to alignments in linguistics

Information Sciences: an International Journal
Statistical metaphor processing

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We review lexical Association Measures (AMs) that have been employed by past work in extracting multiword expressions. Our work contributes to the understanding of these AMs by categorizing them into two groups and suggesting the use of rank equivalence to group AMs with the same ranking performance. We also examine how existing AMs can be adapted to better rank English verb particle constructions and light verb constructions. Specifically, we suggest normalizing (Pointwise) Mutual Information and using marginal frequencies to construct penalization terms. We empirically validate the effectiveness of these modified AMs in detection tasks in English, performed on the Penn Treebank, which shows significant improvement over the original AMs.