Boosting feature selection using information metric for classification

  • Authors:
  • Huawen Liu;Lei Liu;Huijie Zhang

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, Changchun 130012, PR China;College of Computer Science and Technology, Jilin University, Changchun 130012, PR China;Department of Computer Science, Northeast Normal University, Changchun 130021, PR China

  • Venue:
  • Neurocomputing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Feature selection plays an important role in pattern classification. Its purpose is to remove redundant features from data set as many as possible. The presence of useless features may not only deteriorate the performance of learning algorithms, but also obscure important information (e.g., intrinsic structure) behind data. Along with new and emerging techniques, data sets in many domains are becoming larger and larger and many irrelevant features are often prevailing in these data sets. This, however, poses great challenges to traditional learning algorithms, such as low efficiency and over-fitting. Thus, it becomes apparent that an efficient technique is needed to eliminate redundant or irrelevant features from the data sets. Currently, many endeavors to cope with this problem have been attempted and various outstanding feature selection methods have been proposed. Unlike other selection methods, in this paper we propose a general scheme of boosting feature selection method using information metric. The primary characteristic of our method is that it exploits weight of data to select salient features. Furthermore, the weight of data will be dynamically changed after each candidate feature has been selected. Thus, the information criteria used in feature selector can exactly represent the relevant degree between features and the class labels. As a result, the selected feature subset has maximal relevance to the class labels. Simulation studies carried out on UCI data sets show that the classification performance achieved by our proposed method is better than those of other selection methods in most cases.