Supportive utility of irrelevant features in data preprocessing

Authors:
Sam Chao;Yiping Li;Mingchui Dong
Affiliations:
Faculty of Science and Technology, University of Macau, Taipa, Macao;Faculty of Science and Technology, University of Macau, Taipa, Macao;Faculty of Science and Technology, University of Macau, Taipa, Macao
Venue:
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2007

Citing 10
Cited 0

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Feature Selection: Evaluation, Application, and Small Sample Performance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Induction of Decision Trees

Machine Learning
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An introduction to variable and feature selection

The Journal of Machine Learning Research
Benefitting from the variables that variable selection discards

The Journal of Machine Learning Research
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many classification algorithms degrade their learning performance while irrelevant features are introduced. Feature selection is a process to choose an optimal subset of features and removes irrelevant ones. But many feature selection algorithms focus on filtering out the irrelevant attributes regarding the learned task only, not considering their hidden supportive information to other attributes: whether they are really irrelevant or potentially relevant? Since in medical domain, an irrelevant symptom is treated as the one providing neither explicit information nor supportive information for disease diagnosis. Therefore, the traditional feature selection methods may be unsuitable for handling such critical problem. In this paper, we propose a new method that selecting not only the relevant features, but also targeting at the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The empirical results demonstrate a comparison of performance of various classification algorithms on twelve real-life datasets from UCI repository.