Pattern extraction method for text classification

  • Authors:
  • Hung Son Nguyen;Hui Wang

  • Affiliations:
  • Institute of Mathematics, Warsaw University, Banacha 2, Warsaw 02095, Poland;School of Information and Software Engineering, University of Ulster at Jordanstown N Ireland, BT37 0QB

  • Venue:
  • Technologies for constructing intelligent systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.