You can't beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction

  • Authors:
  • Joachim Wermter;Udo Hahn

  • Affiliations:
  • Jena University Language & Information Engineering (JULIE) Lab, Jena, Germany;Jena University Language & Information Engineering (JULIE) Lab, Jena, Germany

  • Venue:
  • ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of co-occurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference.