Handling sparsity for verb noun MWE token classification

  • Authors:
  • Mona T. Diab;Madhav Krishna

  • Affiliations:
  • Columbia University;Columbia University

  • Venue:
  • GEMS '09 Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is measured by contextual overlap. To this end, we set out to explore different contextual variations and different similarity measures handling the sparsity in the possible contexts via four different parameter variations. Our approach yields state of the art performance with an overall accuracy of 75.54% on a TEST data set.