Foundations of statistical natural language processing
Foundations of statistical natural language processing
Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Automatic identification of non-compositional phrases
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An empirical model of multiword expression decomposability
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Automatic identification of non-compositional multi-word expressions using latent semantic analysis
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Semantics-based multiword expression extraction
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Linguistic cues for distinguishing literal and non-literal usages
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Unsupervised identification of persian compound verbs
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Hi-index | 0.00 |
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is measured by contextual overlap. To this end, we set out to explore different contextual variations and different similarity measures. We also identify a new data set OPAQUE that comprises only non-decomposable VNC expressions. Our approach yields state of the art performance with an overall accuracy of 77.56% on a TEST data set and 81.66% on the newly characterized data set OPAQUE.