Streamwise Feature Selection

Authors:
Jing Zhou;Dean P. Foster;Robert A. Stine;Lyle H. Ungar
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2006

Citing 10
Cited 5

Selection of relevant features and examples in machine learning

Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Relational Data Mining

Relational Data Mining
Cluster-based concept invention for statistical relational learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Margin based feature selection - theory and algorithms

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Fast Binary Feature Selection with Conditional Mutual Information

The Journal of Machine Learning Research
Streaming feature selection using alpha-investing

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
Local asymptotic coding and the minimum description length

IEEE Transactions on Information Theory

Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC)

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Quadratic Programming Feature Selection

The Journal of Machine Learning Research
Mining emerging patterns by streaming feature selection

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Deciding on an adjustment for multiplicity in IR experiments

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Online variational learning of generalized Dirichlet mixture models with feature selection

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe information-investing and α-investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ from standard penalty methods such as AIC, BIC and RIC, which always drastically over- or under-fit in the limit of infinite numbers of non-predictive features. Empirical results show that streamwise regression is competitive with (on small data sets) and superior to (on large data sets) much more compute-intensive feature selection methods such as stepwise regression, and allows feature selection on problems with millions of potential features.