Improved automatic keyword extraction given more linguistic knowledge

  • Authors:
  • Anette Hulth

  • Affiliations:
  • Stockholm University, Sweden

  • Venue:
  • EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the PoS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.