Automatic keyphrase extraction from scientific documents using N-gram filtration technique

  • Authors:
  • Niraj Kumar;Kannan Srinathan

  • Affiliations:
  • IIIT-Hyderabad, Hyderabad, India;IIIT-Hyderabad, Hyderabad, India

  • Venue:
  • Proceedings of the eighth ACM symposium on Document engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present an automatic Keyphrase extraction technique for English documents of scientific domain. The devised algorithm uses n-gram filtration technique, which filters sophisticated n-grams {1dnd4} along with their weight from the words of input document. To develop n-gram filtration technique, we have used (1) LZ78 data compression based technique, (2) a simple refinement step, (3) A simple Pattern Filtration algorithm and, (4) a term weighting scheme. In term weighting scheme, we have introduced the importance of position of sentence (where given phrase occurs first) in document and position of phrase in sentence for documents of scientific domain (which is literally more organized than other domains). The entire system is based upon statistical observations, simple grammatical facts, heuristics, and lexical information of English language. We remark that the devised system does not require a learning phase. Our experimental results with publically available text dataset, shows that the devised system is comparable with other known algorithms.