A Robust Meaning Extraction Methodology Using Supervised Neural Networks

Authors:
D. A. Karras;B. G. Mertzios
Affiliations:
-;-
Venue:
AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Year:
2002

Citing 7
Cited 1

Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Information Visualization for Collaborative Computing

Computer
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Web page feature selection and classification using neural networks

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large amount of information, stored in intranets and internet databases and accessed through the World-Wide Web, is organized in the form of full-text documents. Efficient retrieval of this information with regards to its meaning and content is an important problem in data mining systems for the creation, management and querying of very large such information bases. In this paper we deal with the main aspect of the problem of extracting meaning from documents, namely, with the problem of text categorization, outlining a novel and systematic approach to it's solution. We present a text categorization system for non-domain specific full-text documents based on the learning and generalization capabilities of neural networks. The main contribution of this paper lies on the feature extraction methodology which, first, involves word semantic categories and not raw words as other rival approaches. As a consequence of coping with the problem of dimensionality reduction, the proposed approach introduces a novel second order approach for text categorization feature extraction by considering word semantic categories cooccurrence analysis. The suggested methodology compares favorably to widely accepted, raw word frequency based techniques in a collection of documents concerning the Dewey Decimal Classification (DDC) system. In these comparisons different Multilayer Perceptrons (MLP) algorithms as well as the Support Vector Machine (SVM), the LVQ and the conventional k-NN technique are involved.