A broad evaluation of techniques for automatic acquisition of multiword expressions

  • Authors:
  • Carlos Ramisch;Vitor De Araujo;Aline Villavicencio

  • Affiliations:
  • Federal University of Rio Grande do Sul (Brazil) and GETALP --- LIG, University of Grenoble (France);Federal University of Rio Grande do Sul (Brazil);Federal University of Rio Grande do Sul (Brazil)

  • Venue:
  • ACL '12 Proceedings of ACL 2012 Student Research Workshop
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several approaches have been proposed for the automatic acquisition of multiword expressions from corpora. However, there is no agreement about which of them presents the best cost-benefit ratio, as they have been evaluated on distinct datasets and/or languages. To address this issue, we investigate these techniques analysing the following dimensions: expression type (compound nouns, phrasal verbs), language (English, French) and corpus size. Results show that these techniques tend to extract similar candidate lists with high recall (~ 80%) for nominals and high precision (~ 70%) for verbals. The use of association measures for candidate filtering is useful but some of them are more onerous and not significantly better than raw counts. We finish with an evaluation of flexibility and an indication of which technique is recommended for each language-type-size context.