Neural based approach to keyword extraction from documents

Authors:
Taeho Jo
Affiliations:
SITE, University of Ottawa, Ottawa, Ontario, Canada
Venue:
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Year:
2003

Citing 10
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Neural networks: algorithms, applications, and programming techniques

Neural networks: algorithms, applications, and programming techniques
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Information storage and retrieval

Information storage and retrieval
On relevance, probabilistic indexing and information retrieval

Readings in information retrieval
Multilingual keyword extraction for term suggestion

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning for Information Extraction in Informal Domains

Machine Learning - Special issue on information retrieval
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Natural language processing applied in itinerary recommender systems

ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select important words related with its content among them. Such important words are called keywords and they are selected with a particular equation based on TF (Term Frequency) and IDF (inverted Document Frequency). Actually, not only TF and IDF but also the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicate to be applied to the selection of keywords. This paper proposes the neural network model, back propagation, in which these factors are used as the features and feature vectors are generated, and with which keywords are selected. This paper will show that backpropagation outperforms the equation in distinguishing keywords.