Pattern extraction method for text classification

Authors:
Hung Son Nguyen;Hui Wang
Affiliations:
Institute of Mathematics, Warsaw University, Banacha 2, Warsaw 02095, Poland;School of Information and Software Engineering, University of Ulster at Jordanstown N Ireland, BT37 0QB
Venue:
Technologies for constructing intelligent systems
Year:
2002

Citing 9
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Corpus-based stemming using cooccurrence of word variants

ACM Transactions on Information Systems (TOIS)
Pattern extraction from data

Fundamenta Informaticae
Rough sets and association rule generation

Fundamenta Informaticae
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Boolean Reasoning for Feature Extraction Problems

ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
Text Classification Using Lattice Machine

ISMIS '99 Proceedings of the 11th International Symposium on Foundations of Intelligent Systems
Discovery of Generalized Patterns

ISMIS '99 Proceedings of the 11th International Symposium on Foundations of Intelligent Systems

Execution patterns for visualizing web services

SoftVis '06 Proceedings of the 2006 ACM symposium on Software visualization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.