Semantics-based multiword expression extraction

  • Authors:
  • Tim Van de Cruys;Begoña Villada Moirón

  • Affiliations:
  • University of Groningen, Groningen, The Netherlands;University of Groningen, Groningen, The Netherlands

  • Venue:
  • MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of mwes; the intuition is that a noun within a mwe cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributional similarity measures), which gives us clusters of semantically related nouns. Next, a number of statistical measures -- based on selectional preferences --- is developed that formalize the intuition of non-compositionality. Our approach has been tested on Dutch, and automatically evaluated using Dutch lexical resources.