Feature selection, L1 vs. L2 regularization, and rotational invariance

Authors:
Andrew Y. Ng
Affiliations:
Stanford University, Stanford, CA
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 6
Cited 94

GroupLens: an open architecture for collaborative filtering of netnews

CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
Selecting weighting factors in logarithmic opinion pools

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Collaborative filtering with privacy via factor analysis

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Training products of experts by minimizing contrastive divergence

Neural Computation
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the use of linear programming for unsupervised text classification

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization

ICML '06 Proceedings of the 23rd international conference on Machine learning
New Developments in Parsing Technology

Computational Linguistics
Advances in discriminative parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scalable training of L1-regularized log-linear models

Proceedings of the 24th international conference on Machine learning
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Feature selection for ranking

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Discriminative structure and parameter learning for Markov logic networks

Proceedings of the 25th international conference on Machine learning
Liknon Feature Selection for Microarrays

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches

ECML '07 Proceedings of the 18th European conference on Machine Learning
Learning with Lq

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Robust multivariate L1 principal component analysis and dimensionality reduction

Neurocomputing
Online Feature Selection Algorithm with Bayesian l 1 Regularization

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient Euclidean projections in linear time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An efficient projection for l1, ∞ regularization

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale deep unsupervised learning using graphics processors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Non-monotonic feature selection

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Large-scale sparse logistic regression

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Radial Basis Function network learning using localized generalization error bound

Information Sciences: an International Journal
EfficientL1regularized logistic regression

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A discriminative candidate generator for string transformations

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using modified Lasso regression to learn large undirected graphs in a probabilistic framework

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
A method for large-scale l1-regularized logistic regression

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Extracting social meaning: identifying interactional style in spoken conversation

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Generalizing local translation models

SSST '08 Proceedings of the Second Workshop on Syntax and Structure in Statistical Translation
Computational challenges in parsing by classification

CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Exponential family sparse coding with applications to self-taught learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
It's not you, it's me: detecting flirting and its misperception in speed-dates

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Learning locomotion over rough terrain using terrain templates

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
A feature-based approach to modeling protein-DNA interactions

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Multiplicative updates for L1-regularized linear and logistic regression

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Relational duality: unsupervised extraction of semantic relations between entities on the web

Proceedings of the 19th international conference on World wide web
Learning gene regulatory networks via globally regularized risk minimization

RECOMB-CG'07 Proceedings of the 2007 international conference on Comparative genomics
Image classification from small sample, with distance learning and feature selection

ISVC'07 Proceedings of the 3rd international conference on Advances in visual computing - Volume Part II
Mixture of the robust L1 distributions and its applications

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Principal component analysis based on non-parametric maximum entropy

Neurocomputing
Efficient learning and feature selection in high-dimensional regression

Neural Computation
Effective structure learning for EDA via L1-regularizedbayesian networks

Proceedings of the 12th annual conference on Genetic and evolutionary computation
An integrated machine learning approach to stroke prediction

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature selection for support vector regression using probabilistic prediction

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
HiLighter: automatically building robust signatures of performance behavior for small- and large-scale systems

SysML'08 Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques
Discriminative semi-supervised feature selection via manifold regularization

IEEE Transactions on Neural Networks
Large-margin classification in infinite neural networks

Neural Computation
Uncertainty detection as approximate max-margin sequence labelling

CoNLL '10: Shared Task Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task
Concensus of self-features for nonverbal behavior analysis

HBU'10 Proceedings of the First international conference on Human behavior understanding
Regularized logistic regression without a penalty term: An application to cancer classification with microarray data

Expert Systems with Applications: An International Journal
Learning, planning, and control for quadruped locomotion over challenging terrain

International Journal of Robotics Research
Recovering Occlusion Boundaries from an Image

International Journal of Computer Vision
Classifying dialogue in high-dimensional space

ACM Transactions on Speech and Language Processing (TSLP)
A coordinate gradient descent method for l1-regularized convex minimization

Computational Optimization and Applications
Using multiple sources to construct a sentiment sensitive thesaurus for cross-domain sentiment classification

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An exponential translation model for target language morphology

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Understanding re-finding behavior in naturalistic email interaction logs

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning relevance from heterogeneous social network and its application in online targeting
Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

International Journal of Applied Mathematics and Computer Science
Ensemble logistic regression for feature selection

PRIB'11 Proceedings of the 6th IAPR international conference on Pattern recognition in bioinformatics
Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints

SIAM Journal on Optimization
Maximum entropy distribution estimation with generalized regularization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition

Knowledge-Based Systems
Beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial and temporal similarities

Proceedings of the fifth ACM international conference on Web search and data mining
From n-gram-based to CRF-based translation models

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
A sequential model for discourse segmentation

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A comparison of complexity selection approaches for polynomials based on: vapnik-chervonenkis dimension, rademacher complexity and covering numbers

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Sample complexity of linear learning machines with different restrictions over weights

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
A new local search based hybrid genetic algorithm for feature selection

Neurocomputing
Regularization techniques for learning with matrices

The Journal of Machine Learning Research
Multimodal classification of breast masses in mammography and MRI using unimodal feature selection and decision fusion

IWDM'12 Proceedings of the 11th international conference on Breast Imaging
Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
A Bayesian compressed-sensing approach for reconstructing neural connectivity from subsampled anatomical data

Journal of Computational Neuroscience
Boosting the protein name recognition performance by bootstrapping on selected text

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Relabeling distantly supervised training data for temporal knowledge base population

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Visual query attributes suggestion

Proceedings of the 20th ACM international conference on Multimedia
Hypergraph-based multi-example ranking with sparse representation for transductive learning image retrieval

Neurocomputing
Leaving so soon?: understanding and predicting web search abandonment rationales

Proceedings of the 21st ACM international conference on Information and knowledge management
Enhancement of low sampling frequency recordings for ECG biometric matching using interpolation

Computer Methods and Programs in Biomedicine
Simplified labeling process for medical image segmentation

MICCAI'12 Proceedings of the 15th international conference on Medical Image Computing and Computer-Assisted Intervention - Volume Part II
Embedding monte carlo search of features in tree-based ensemble methods

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Sparse methods for biomedical data

ACM SIGKDD Explorations Newsletter
Dynamic learning of SCRF for feature selection and classification of hyperspectral imagery

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Multi-resolutive sparse approximations of d-dimensional data

Computer Vision and Image Understanding
A local information-based feature-selection algorithm for data regression

Pattern Recognition
Logistic regression with weight grouping priors

Computational Statistics & Data Analysis
Enriching media fragments with named entities for video classification

Proceedings of the 22nd international conference on World Wide Web companion
Probabilistic multi-label classification with sparse feature learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
On robust estimation of high dimensional generalized linear models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Entity linking at the tail: sparse signals, unknown entities, and phrase models

Proceedings of the 7th ACM international conference on Web search and data mining
Fully corrective boosting with arbitrary loss and regularization

Neural Networks
Distribution-dependent sample complexity of large margin learning

The Journal of Machine Learning Research
Subspace clustering of high-dimensional data: a predictive approach

Data Mining and Knowledge Discovery
Heartbeat classification using disease-specific feature selection

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn "well,") grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L1 regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lower-bound showing that any rotationally invariant algorithm---including logistic regression with L2 regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.