A Statistical Corpus-Based Term Extractor

  • Authors:
  • Patrick Pantel;Dekang Lin

  • Affiliations:
  • -;-

  • Venue:
  • AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Term extraction is an important problem in natural language processing. In this paper, we propose a language independent statistical corpus-based term extraction algorithm. In previous approaches, evaluation has been subjective, at best relying on a lexicographer's judgement. We evaluate the quality of our term extractor by assessing its predictiveness on an unseen corpus using perplexity. Second, we evaluate the precision and recall of our extractor by comparing the Chinese words in a segmented corpus with the words extracted by our system.