Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Bayesian Learning for Neural Networks
Bayesian Learning for Neural Networks
Relational Data Mining
Cluster-based concept invention for statistical relational learning
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Margin based feature selection - theory and algorithms
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Fast Binary Feature Selection with Conditional Mutual Information
The Journal of Machine Learning Research
Streaming feature selection using alpha-investing
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Local asymptotic coding and the minimum description length
IEEE Transactions on Information Theory
Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC)
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Quadratic Programming Feature Selection
The Journal of Machine Learning Research
Mining emerging patterns by streaming feature selection
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Deciding on an adjustment for multiplicity in IR experiments
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe information-investing and α-investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ from standard penalty methods such as AIC, BIC and RIC, which always drastically over- or under-fit in the limit of infinite numbers of non-predictive features. Empirical results show that streamwise regression is competitive with (on small data sets) and superior to (on large data sets) much more compute-intensive feature selection methods such as stepwise regression, and allows feature selection on problems with millions of potential features.