Handling sparsity for verb noun MWE token classification

Authors:
Mona T. Diab;Madhav Krishna
Affiliations:
Columbia University;Columbia University
Venue:
GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
Year:
2009

Citing 10
Cited 2

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Automatic identification of non-compositional phrases

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An empirical model of multiword expression decomposability

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Japanese idiom recognition: drawing a line between literal and idiomatic meanings

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Semantics-based multiword expression extraction

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Construction of an idiom corpus and its application to idiom identification based on WSD incorporating idiom-specific features

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Verb noun construction MWE token supervised classification

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Unsupervised identification of persian compound verbs

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is measured by contextual overlap. To this end, we set out to explore different contextual variations and different similarity measures handling the sparsity in the possible contexts via four different parameter variations. Our approach yields state of the art performance with an overall accuracy of 75.54% on a TEST data set.