Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Machine Learning
SLIQ: A Fast Scalable Classifier for Data Mining
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Feature Selection for Support Vector Machines by Means of Genetic Algorithms
ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Feature selection for classifying high-dimensional numerical data
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Combined kernel function approach in SVM for diagnosis of cancer
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part I
Computers in Biology and Medicine
Hi-index | 0.00 |
The performance of learning tasks is very sensitive to the characteristics of training data. There are several ways to increase the effect of learning performance including standardization, normalization, signal enhancement, linear or non-linear space embedding methods, etc. Among those methods, determining the relevant and informative features is one of the key steps in the data analysis process that helps to improve the performance, reduce the generation of data, and understand the characteristics of data. Researchers have developed the various methods to extract the set of relevant features but no one method prevails. Random Forest, which is an ensemble classifier based on the set of tree classifiers, turns out good classification performance. Taking advantage of Random Forest and using wrapper approach first introduced by Kohavi et al, we propose a new algorithm to find the optimal subset of features. The Random Forest is used to obtain the feature ranking values. And these values are applied to decide which features are eliminated in the each iteration of the algorithm. We conducted experiments with two public datasets: colon cancer and leukemia cancer. The experimental results of the real world data showed that the proposed method results in a higher prediction rate than a baseline method for certain data sets and also shows comparable and sometimes better performance than the feature selection methods widely used.