An improved text categorization methodology based on second and third order probabilistic feature extraction and neural network classifiers

Authors:
D. A. Karras
Affiliations:
Chalkis Institute of Technology, Automation Department & Hellenic Open University, Athens, Greece
Venue:
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Year:
2006

Citing 7
Cited 1

Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Information Visualization for Collaborative Computing

Computer
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Improving the performance of association classifiers by rule prioritization

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we deal with the main aspect of the problem of extracting meaning from documents, namely, with the problem of text categorization, outlining a novel and systematic approach to its solution. We present a text categorization system for non-domain specific full-text documents. The main contribution of this paper lies on the feature extraction methodology which, first, involves word semantic categories and not raw words as other rival approaches. As a consequence of coping with the problem of dimensionality reduction, the proposed approach introduces a novel second and third order feature extraction approach for text categorization by considering word semantic categories cooccurrence analysis. The suggested methodology compares favorably to widely accepted, raw word frequency based techniques in a collection of documents concerning the Dewey Decimal Classification (DDC) system. In these comparisons different Multilayer Perceptrons (MLP) algorithms, the LVQ and the conventional k-NN technique are involved.