A Robust Meaning Extraction Methodology Using Supervised Neural Networks

  • Authors:
  • D. A. Karras;B. G. Mertzios

  • Affiliations:
  • -;-

  • Venue:
  • AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large amount of information, stored in intranets and internet databases and accessed through the World-Wide Web, is organized in the form of full-text documents. Efficient retrieval of this information with regards to its meaning and content is an important problem in data mining systems for the creation, management and querying of very large such information bases. In this paper we deal with the main aspect of the problem of extracting meaning from documents, namely, with the problem of text categorization, outlining a novel and systematic approach to it's solution. We present a text categorization system for non-domain specific full-text documents based on the learning and generalization capabilities of neural networks. The main contribution of this paper lies on the feature extraction methodology which, first, involves word semantic categories and not raw words as other rival approaches. As a consequence of coping with the problem of dimensionality reduction, the proposed approach introduces a novel second order approach for text categorization feature extraction by considering word semantic categories cooccurrence analysis. The suggested methodology compares favorably to widely accepted, raw word frequency based techniques in a collection of documents concerning the Dewey Decimal Classification (DDC) system. In these comparisons different Multilayer Perceptrons (MLP) algorithms as well as the Support Vector Machine (SVM), the LVQ and the conventional k-NN technique are involved.