An n-gram frequency database reference to handle MWE extraction in NLP applications

  • Authors:
  • Patrick Watrin;Thomas François

  • Affiliations:
  • Centre for Natural Language Processing, Institut Langage et Communication, UCLouvain;Aspirant F. N. R. S., Centre for Natural Language Processing, Institut Langage et Communication, UCLouvain

  • Venue:
  • MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The identification and extraction of Multiword Expressions (MWEs) currently deliver satisfactory results. However, the integration of these results into a wider application remains an issue. This is mainly due to the fact that the association measures (AMs) used to detect MWEs require a critical amount of data and that the MWE dictionaries cannot account for all the lexical and syntactic variations inherent in MWEs. In this study, we use an alternative technique to overcome these limitations. It consists in defining an n-gram frequency data-base that can be used to compute AMs on-the-fly, allowing the extraction procedure to efficiently process all the MWEs in a text, even if they have not been previously observed.