Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Web page feature selection and classification using neural networks
Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Hi-index | 0.00 |
A large amount of information, stored in intranets and internet databases and accessed through the World-Wide Web, is organized in the form of full-text documents. Efficient retrieval of this information with regards to its meaning and content is an important problem in data mining systems for the creation, management and querying of very large such information bases. In this paper we deal with the main aspect of the problem of extracting meaning from documents, namely, with the problem of text categorization, outlining a novel and systematic approach to it's solution. We present a text categorization system for non-domain specific full-text documents based on the learning and generalization capabilities of neural networks. The main contribution of this paper lies on the feature extraction methodology which, first, involves word semantic categories and not raw words as other rival approaches. As a consequence of coping with the problem of dimensionality reduction, the proposed approach introduces a novel second order approach for text categorization feature extraction by considering word semantic categories cooccurrence analysis. The suggested methodology compares favorably to widely accepted, raw word frequency based techniques in a collection of documents concerning the Dewey Decimal Classification (DDC) system. In these comparisons different Multilayer Perceptrons (MLP) algorithms as well as the Support Vector Machine (SVM), the LVQ and the conventional k-NN technique are involved.