Multilingual collocation extraction: issues and solutions

  • Authors:
  • Violeta Seretan;Eric Wehrli

  • Affiliations:
  • University of Geneva, Geneva;University of Geneva, Geneva

  • Venue:
  • MLRI '06 Proceedings of the Workshop on Multilingual Language Resources and Interoperability
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although traditionally seen as a language-independent task, collocation extraction relies nowadays more and more on the linguistic preprocessing of texts (e.g., lemmatization, POS tagging, chunking or parsing) prior to the application of statistical measures. This paper provides a language-oriented review of the existing extraction work. It points out several language-specific issues related to extraction and proposes a strategy for coping with them. It then describes a hybrid extraction system based on a multilingual parser. Finally, it presents a case-study on the performance of an association measure across a number of languages.