Supportive utility of irrelevant features in data preprocessing

  • Authors:
  • Sam Chao;Yiping Li;Mingchui Dong

  • Affiliations:
  • Faculty of Science and Technology, University of Macau, Taipa, Macao;Faculty of Science and Technology, University of Macau, Taipa, Macao;Faculty of Science and Technology, University of Macau, Taipa, Macao

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many classification algorithms degrade their learning performance while irrelevant features are introduced. Feature selection is a process to choose an optimal subset of features and removes irrelevant ones. But many feature selection algorithms focus on filtering out the irrelevant attributes regarding the learned task only, not considering their hidden supportive information to other attributes: whether they are really irrelevant or potentially relevant? Since in medical domain, an irrelevant symptom is treated as the one providing neither explicit information nor supportive information for disease diagnosis. Therefore, the traditional feature selection methods may be unsuitable for handling such critical problem. In this paper, we propose a new method that selecting not only the relevant features, but also targeting at the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The empirical results demonstrate a comparison of performance of various classification algorithms on twelve real-life datasets from UCI repository.