C4.5: programs for machine learning
C4.5: programs for machine learning
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
An introduction to variable and feature selection
The Journal of Machine Learning Research
Efficiently handling feature redundancy in high-dimensional data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Feature Selection via Analysis of Relevance and Redundancy
The Journal of Machine Learning Research
Hi-index | 0.00 |
The performance of a classification algorithm in data mining is greatly affected by the quality of data source. Irrelevant and redundant features of data not only increase the cost of mining process, but also degrade the quality of the result in some cases. This issue is particularly important to high-dimensional data, in that many features may either irrelevant or redundant for a selected classification target. Accordingly, feature selection becomes an essential part in data preparation. The feature selection for classification is to identify and remove irrelevant and redundant features, which do not contribute to modeling for a selected target. Among the existing feature selection methods, fast correlation-based filter and correlation-based feature selection are most commonly used approaches. The main concern of the these methods is that they may over simplify the features of a given data set by removing many useful features because of certain inherent limitation in these methods. As a result, the selected feature set may be over-simplified to be useful in practice. In this paper, we analyze the existing issue, and present an extended fast feature selection algorithm to overcome the problem. Experiments are conducted using real data from financial institutions to demonstrate the improvement in terms of quality of selected features. A result comparison between the proposed method and other three major methods is provided.