Rough set feature selection methods for case-based categorization of text documents

Authors:
Kalyan Moy Gupta;Philip G. Moore;David W. Aha;Sankar K. Pal
Affiliations:
ITT Industries, Alexandria, VA;ITT Industries, Alexandria, VA;ITT Industries, Alexandria, VA;Indian Statistical Institute, Kolkata, India
Venue:
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Year:
2005

Citing 4
Cited 5

Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Foundations of Soft Case-Based Reasoning

Foundations of Soft Case-Based Reasoning
Approximation algorithms for combinatorial problems

Journal of Computer and System Sciences

Letters: A novel condensing tree structure for rough set feature selection

Neurocomputing
A New Rough Set Reduct Algorithm Based on Particle Swarm Optimization

IWINAC '07 Proceedings of the 2nd international work-conference on The Interplay Between Natural and Artificial Computation, Part I: Bio-inspired Modeling of Cognitive Tasks
A case-based evolutionary group decision support method for emergency response

PAISI'07 Proceedings of the 2007 Pacific Asia conference on Intelligence and security informatics
Rough set based approaches to feature selection for Case-Based Reasoning classifiers

Pattern Recognition Letters
Rough set feature selection algorithms for textual case-based classification

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Textual case bases can contain thousands of features in the form of tokens or words, which can inhibit classification performance. Recent developments in rough set theory and its applications to feature selection offer promising approaches for selecting and reducing the number of features. We adapt two rough set feature selection methods for use on n-ary class text categorization problems. We also introduce a new method for selecting features that computes the union of features selected from randomly-partitioned training subsets. Our comparative evaluation of our method with a conventional method on the Reuters-21578 data set shows that it can dramatically decrease training time without compromising classification accuracy. Also, we found that randomized training set partitions dramatically reduce training time.