Neural based approach to keyword extraction from documents

  • Authors:
  • Taeho Jo

  • Affiliations:
  • SITE, University of Ottawa, Ottawa, Ontario, Canada

  • Venue:
  • ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select important words related with its content among them. Such important words are called keywords and they are selected with a particular equation based on TF (Term Frequency) and IDF (inverted Document Frequency). Actually, not only TF and IDF but also the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicate to be applied to the selection of keywords. This paper proposes the neural network model, back propagation, in which these factors are used as the features and feature vectors are generated, and with which keywords are selected. This paper will show that backpropagation outperforms the equation in distinguishing keywords.