Redundant feature elimination by using approximate Markov blanket based on discriminative contribution

  • Authors:
  • Xue-Qiang Zeng;Su-Fen Chen;Hua-Xing Zou

  • Affiliations:
  • Computer Center, Nanchang University, Nanchang, China;Department of Computer Science and Technology, Nanchang Institute of Technology, Nanchang, China;Computer Center, Nanchang University, Nanchang, China

  • Venue:
  • WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

As a high dimensional problem, it is a hard task to analyze the text data sets, where many weakly relevant but redundant features hurt generalization performance of classifiers. There are previous works to handle this problem by using pair-wise feature similarities, which do not consider discriminative contribution of each feature by utilizing the label information. Here we define an Approximate Markov Blanket (AMB) based on the metric of DIScriminative Contribution (DISC) to eliminate redundant features and propose the AMB-DISC algorithm. Experimental results on the data set of Reuter-21578 show AMBDISC is much better than the previous state-of-arts feature selection algorithms considering feature redundancy in terms of MicroavgF1 and MacroavgF1.