A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Learning Boolean concepts in the presence of many irrelevant features
Artificial Intelligence
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection in unsupervised learning via evolutionary search
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Extraction, Construction and Selection: A Data Mining Perspective
Feature Extraction, Construction and Selection: A Data Mining Perspective
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
Data Mining and Knowledge Discovery
Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Mathematical Programming for Data Mining: Formulations and Challenges
INFORMS Journal on Computing
Nested Partitions Method for Global Optimization
Operations Research
Intelligent Partitioning for Feature Selection
INFORMS Journal on Computing
Feature selection in bankruptcy prediction
Knowledge-Based Systems
A filter model for feature subset selection based on genetic algorithm
Knowledge-Based Systems
Variable selection by association rules for customer churn prediction of multimedia on demand
Expert Systems with Applications: An International Journal
Cognitive styles and web-based instruction: field dependent/independent vs. Holist/Serialist
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Nonlinear fuzzy robust PCA algorithms and similarity classifier in bankruptcy analysis
Expert Systems with Applications: An International Journal
An efficient preprocessing stage for the relationship-based clustering framework
Intelligent Data Analysis
A GRASP method for building classification trees
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
Diagnosing Breast Masses in Digital Mammography Using Feature Selection and Ensemble Methods
Journal of Medical Systems
Determinants of intangible assets value: The data mining approach
Knowledge-Based Systems
Scalable improved quick reduct: sample based
RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Genetic algorithms in feature and instance selection
Knowledge-Based Systems
Hi-index | 0.03 |
Preprocessing the data to filter out redundant and irrelevant features is one of the most important steps in the data mining process. Careful feature selection may improve both the computational time of inducing subsequent models and the quality of those models. Using fewer features often leads to simpler and easier to interpret models, and selecting important feature can lead to important insights into the application. The feature selection problem is inherently a combinatorial optimization problem. This paper builds on a metaheuristic called the nested partitions method that has been shown to be particularly effective for the feature selection problem. Specifically, we focus on the scalability of the method and show that its performance is vastly improved by incorporating random sampling of instances. Furthermore, we develop an adaptive variant of the algorithm that dynamically determines the required sample rate. The adaptive algorithm is shown to perform very well when applied to a set of standard test problems.