C4.5: programs for machine learning
C4.5: programs for machine learning
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Fundamenta Informaticae
Rough sets and association rule generation
Fundamenta Informaticae
Rough Sets: Theoretical Aspects of Reasoning about Data
Rough Sets: Theoretical Aspects of Reasoning about Data
Boolean Reasoning for Feature Extraction Problems
ISMIS '97 Proceedings of the 10th International Symposium on Foundations of Intelligent Systems
Text Classification Using Lattice Machine
ISMIS '99 Proceedings of the 11th International Symposium on Foundations of Intelligent Systems
Discovery of Generalized Patterns
ISMIS '99 Proceedings of the 11th International Symposium on Foundations of Intelligent Systems
Execution patterns for visualizing web services
SoftVis '06 Proceedings of the 2006 ACM symposium on Software visualization
Hi-index | 0.00 |
The quality of classification can be increased by using some feature extraction algorithm, i.e. the algorithm that finds new and more relevant features, before application of learning procedure. In this paper, we investigate a novel feature extraction method for textual data. Usually, texts (documents) are represented as collections of words or keywords. We present a method for finding new numerical attributes that improve the quality of classification. New features are based on a set of words (text pattern) and are defined as number of words occurring in both text pattern and the considered document. Our approach is based on Rough set methods and Lattice Machine theory. The experimental results show that the presented methods improve the classification quality on almost all textual data.