Paradigmatic modifiability statistics for the extraction of complex multi-word terms

  • Authors:
  • Joachim Wermter;Udo Hahn

  • Affiliations:
  • Jena University, Jena, Germany;Jena University, Jena, Germany

  • Venue:
  • HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We here propose a new method which sets apart domain-specific terminology from common non-specific noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-specific noun phrases. We define a measure for the observable amount of paradigmatic modifiability of terms and, subsequently, test it on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using a community-wide curated biomedical terminology system as an evaluation gold standard, we show that our algorithm significantly outperforms a variety of standard term identification measures. We also provide empirical evidence that our methodolgy is essentially domain- and corpus-size-independent.