Rough set feature selection algorithms for textual case-based classification

Authors:
Kalyan Moy Gupta;David W. Aha;Philip Moore
Affiliations:
Knexus Research Corp., Springfield, VA;Naval Research Laboratory (Code 5515), Washington, DC;AES Division, ITT Industries, Alexandria, VA
Venue:
ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Year:
2006

Citing 8
Cited 6

A corpus-based approach to language learning

A corpus-based approach to language learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Foundations of Soft Case-Based Reasoning

Foundations of Soft Case-Based Reasoning
Textual case-based reasoning

The Knowledge Engineering Review
Approximation algorithms for combinatorial problems

Journal of Computer and System Sciences
Combining case-based and model-based reasoning for predicting the outcome of legal cases

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
Rough set feature selection methods for case-based categorization of text documents

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Generating estimates of classification confidence for a case-based spam filter

ICCBR'05 Proceedings of the 6th international conference on Case-Based Reasoning Research and Development

Letters: A novel condensing tree structure for rough set feature selection

Neurocomputing
Enabling the interoperability of large-scale legacy systems

IAAI'08 Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3
IMT: a mixed-initiative data mapping and search toolkit

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
A novel algorithm based on conditional entropy established by clustering for feature selection

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Rough set based approaches to feature selection for Case-Based Reasoning classifiers

Pattern Recognition Letters
A novel approach to improving C-Tree for feature selection

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection algorithms can reduce the high dimensionality of textual cases and increase case-based task performance. However, conventional algorithms (e.g., information gain) are computationally expensive. We previously showed that, on one dataset, a rough set feature selection algorithm can reduce computational complexity without sacrificing task performance. Here we test the generality of our findings on additional feature selection algorithms, add one data set, and improve our empirical methodology. We observed that features of textual cases vary in their contribution to task performance based on their part-of-speech, and adapted the algorithms to include a part-of-speech bias as background knowledge. Our evaluation shows that injecting this bias significantly increases task performance for rough set algorithms, and that one of these attained significantly higher classification accuracies than information gain. We also confirmed that, under some conditions, randomized training partitions can dramatically reduce training times for rough set algorithms without compromising task performance.