Automatic identification of non-compositional multi-word expressions using latent semantic analysis

  • Authors:
  • Graham Katz;Eugenie Giesbrecht

  • Affiliations:
  • University of Osnabrück;University of Osnabrück

  • Venue:
  • MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Making use of latent semantic analysis, we explore the hypothesis that local linguistic context can serve to identify multi-word expressions that have non-compositional meanings. We propose that vector-similarity between distribution vectors associated with an MWE as a whole and those associated with its constituent parts can serve as a good measure of the degree to which the MWE is compositional. We present experiments that show that low (cosine) similarity does, in fact, correlate with non-compositionality.