Extended fast feature selection for classification modeling

Authors:
Weijun Wu;Qigang Gao;Muhong Wang
Affiliations:
Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada;Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada;Department of Finance and Management Science, Saint Mary's University, Halifax, Nova Scotia, Canada
Venue:
ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
Year:
2006

Citing 6
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
An introduction to variable and feature selection

The Journal of Machine Learning Research
Efficiently handling feature redundancy in high-dimensional data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of a classification algorithm in data mining is greatly affected by the quality of data source. Irrelevant and redundant features of data not only increase the cost of mining process, but also degrade the quality of the result in some cases. This issue is particularly important to high-dimensional data, in that many features may either irrelevant or redundant for a selected classification target. Accordingly, feature selection becomes an essential part in data preparation. The feature selection for classification is to identify and remove irrelevant and redundant features, which do not contribute to modeling for a selected target. Among the existing feature selection methods, fast correlation-based filter and correlation-based feature selection are most commonly used approaches. The main concern of the these methods is that they may over simplify the features of a given data set by removing many useful features because of certain inherent limitation in these methods. As a result, the selected feature set may be over-simplified to be useful in practice. In this paper, we analyze the existing issue, and present an extended fast feature selection algorithm to overcome the problem. Experiments are conducted using real data from financial institutions to demonstrate the improvement in terms of quality of selected features. A result comparison between the proposed method and other three major methods is provided.