Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Neural networks: algorithms, applications, and programming techniques
Neural networks: algorithms, applications, and programming techniques
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Information storage and retrieval
Information storage and retrieval
On relevance, probabilistic indexing and information retrieval
Readings in information retrieval
Multilingual keyword extraction for term suggestion
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Natural language processing applied in itinerary recommender systems
ACACOS'11 Proceedings of the 10th WSEAS international conference on Applied computer and applied computational science
Hi-index | 0.00 |
Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select important words related with its content among them. Such important words are called keywords and they are selected with a particular equation based on TF (Term Frequency) and IDF (inverted Document Frequency). Actually, not only TF and IDF but also the position of each word in the document and the inclusion of the word in the title should be considered to select keywords among words contained in the text. The equation based on these factors gets too complicate to be applied to the selection of keywords. This paper proposes the neural network model, back propagation, in which these factors are used as the features and feature vectors are generated, and with which keywords are selected. This paper will show that backpropagation outperforms the equation in distinguishing keywords.