Grafting: fast, incremental feature selection by gradient descent in function space

Authors:
Simon Perkins;Kevin Lacker;James Theiler
Affiliations:
Space and Remote Sensing Sciences, Los Alamos National Laboratory, Los Alamos, NM;Department of Computer Science, University of California, Berkeley, CA;Space and Remote Sensing Sciences, Los Alamos National Laboratory, Los Alamos, NM
Venue:
The Journal of Machine Learning Research
Year:
2003

Citing 8
Cited 46

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Regularization theory and neural networks architectures

Neural Computation
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Advances in Large Margin Classifiers

Advances in Large Margin Classifiers
A Practical Approach to Feature Selection

ML '92 Proceedings of the Ninth International Workshop on Machine Learning
Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Selection and Dualities in Maximum Entropy Discrimination

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence

An introduction to variable and feature selection

The Journal of Machine Learning Research
Gradient LASSO for feature selection

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Online feature selection for pixel classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
Boosting-based parse reranking with subtree features

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Advances in discriminative parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Feature Selection via Coalitional Game Theory

Neural Computation
Kernel discriminant analysis based feature selection

Neurocomputing
Enhanced feature selection models using gradient-based and point injection techniques

Neurocomputing
Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

ECML '07 Proceedings of the 18th European conference on Machine Learning
Learning to Combine Motor Primitives Via Greedy Additive Regression

The Journal of Machine Learning Research
Large-scale sparse logistic regression

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast full parsing by linear-chain conditional random fields

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Using modified Lasso regression to learn large undirected graphs in a probabilistic framework

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Feature selection for activity recognition in multi-robot domains

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Computational challenges in parsing by classification

CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Feature selection based on the Shapley value

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Speech recognition using augmented conditional random fields

IEEE Transactions on Audio, Speech, and Language Processing
Selective enhancement learning in competitive learning

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Correlation-based feature ranking for online classification

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
A feature-based approach to modeling protein-DNA interactions

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Learning gene regulatory networks via globally regularized risk minimization

RECOMB-CG'07 Proceedings of the 2007 international conference on Comparative genomics
Grafting-light: fast, incremental feature selection and structure learning of Markov random fields

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Practical very large scale CRFs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Feature selection for fluency ranking

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Kernel slicing: scalable online training with conjunctive features

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Concensus of self-features for nonverbal behavior analysis

HBU'10 Proceedings of the First international conference on Human behavior understanding
Part-based feature synthesis for human detection

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Selective information enhancement learning for creating interpretable representations in competitive learning

Neural Networks
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
A game theoretic approach for feature clustering and its application to feature selection

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Fast coordinate descent methods with variable selection for non-negative matrix factorization

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An excellent feature selection model using gradient-based and point injection techniques

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Feature selection based on kernel discriminant analysis

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
Evaluating feature selection for SVMs in high dimensions

ECML'06 Proceedings of the 17th European conference on Machine Learning
A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

Knowledge-Based Systems
Evaluation of feature selection by multiclass kernel discriminant analysis

ANNPR'10 Proceedings of the 4th IAPR TC3 conference on Artificial Neural Networks in Pattern Recognition
A life-long learning vector quantization approach for interactive learning of multiple categories

Neural Networks
Embedded feature selection for support vector machines: state-of-the-art and future challenges

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Feature selection for dimensionality reduction

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Discriminative features in reversible stochastic attribute-value grammars

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
A Bayesian compressed-sensing approach for reconstructing neural connectivity from subsampled anatomical data

Journal of Computational Neuroscience
A variance reduction framework for stable feature selection

Statistical Analysis and Data Mining
Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Accelerated large scale optimization by concomitant hashing

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Large-scale multilabel propagation based on efficient sparse graph construction

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Robust feature selection based on regularized brownboost loss

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel and flexible approach to the problem of feature selection, called grafting. Rather than considering feature selection as separate from learning, grafting treats the selection of suitable features as an integral part of learning a predictor in a regularized learning framework. To make this regularized learning process sufficiently fast for large scale problems, grafting operates in an incremental iterative fashion, gradually building up a feature set while training a predictor model using gradient descent. At each iteration, a fast gradient-based heuristic is used to quickly assess which feature is most likely to improve the existing model, that feature is then added to the model, and the model is incrementally optimized using gradient descent. The algorithm scales linearly with the number of data points and at most quadratically with the number of features. Grafting can be used with a variety of predictor model classes, both linear and non-linear, and can be used for both classification and regression. Experiments are reported here on a variant of grafting for classification, using both linear and non-linear models, and using a logistic regression-inspired loss function. Results on a variety of synthetic and real world data sets are presented. Finally the relationship between grafting, stagewise additive modelling, and boosting is explored.