COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Attribute selection for modelling
Future Generation Computer Systems - Special double issue on data mining
Making large-scale support vector machine learning practical
Advances in kernel methods
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Feature selection for high-dimensional genomic microarray data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Efficient Mining from Large Databases by Query Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Pattern Recognition Letters
Hi-index | 0.00 |
We propose a new data mining method that is effective for mining from extremely high-dimensional databases. Our proposed method iteratively selects a subset of features from a database and builds a hypothesis with the subset. Our selection of a feature subset has two steps, i.e. selecting a subset of instances from the database, to which predictions by multiple hypotheses previously obtained are most unreliable, and then selecting a subset of features, the distribution of whose values in the selected instances varies the most from that in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of two other methods, including Xing et al.'s one of the latest feature subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods, both in terms of the final predictive accuracy and the precision attained at a recall given by Xing et al.'s method. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels.