Extended fast feature selection for classification modeling

  • Authors:
  • Weijun Wu;Qigang Gao;Muhong Wang

  • Affiliations:
  • Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada;Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada;Department of Finance and Management Science, Saint Mary's University, Halifax, Nova Scotia, Canada

  • Venue:
  • ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of a classification algorithm in data mining is greatly affected by the quality of data source. Irrelevant and redundant features of data not only increase the cost of mining process, but also degrade the quality of the result in some cases. This issue is particularly important to high-dimensional data, in that many features may either irrelevant or redundant for a selected classification target. Accordingly, feature selection becomes an essential part in data preparation. The feature selection for classification is to identify and remove irrelevant and redundant features, which do not contribute to modeling for a selected target. Among the existing feature selection methods, fast correlation-based filter and correlation-based feature selection are most commonly used approaches. The main concern of the these methods is that they may over simplify the features of a given data set by removing many useful features because of certain inherent limitation in these methods. As a result, the selected feature set may be over-simplified to be useful in practice. In this paper, we analyze the existing issue, and present an extended fast feature selection algorithm to overcome the problem. Experiments are conducted using real data from financial institutions to demonstrate the improvement in terms of quality of selected features. A result comparison between the proposed method and other three major methods is provided.