A hybrid approach for multiword expression identification

  • Authors:
  • Carlos Ramisch;Helena de Medeiros Caseli;Aline Villavicencio;André Machado;Maria José Finatto

  • Affiliations:
  • GETALP/LIG, University of Grenoble, (France);Department of Computer Science, Federal University of São Carlos, (Brazil);Institute of Informatics, Federal University of Rio Grande do Sul, (Brazil);Institute of Informatics, Federal University of Rio Grande do Sul, (Brazil);Institute of Language and Linguistics, Federal University of Rio Grande do Sul, (Brazil)

  • Venue:
  • PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Considerable attention has been given to the problem of Multiword Expression (MWE) identification and treatment, for NLP tasks like parsing and generation, to improve the quality of results. Statistical methods have been often employed for MWE identification, as an inexpensive and language independent way of finding co-occurrence patterns. On the other hand, more linguistically motivated methods for identification, which employ information such as POS filters and lexical alignment between languages, can produce more targeted candidate lists. In this paper we propose a hybrid approach that combines the strenghts of different sources of information using a machine learning algorithm to produce more robust and precise results. Automatic evaluation on gold standards shows that the performance of our hybrid method is superior to the individual results of statistical and alignment-based MWE extraction approaches for Portuguese and for English. This method can be used to aid lexicographic work by providing a more targeted MWE candidate list.