Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

Authors:
Balaji Krishnapuram;Lawrence Carin;Mario A. T. Figueiredo;Alexander J. Hartemink
Affiliations:
-;IEEE;IEEE;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2005

Citing 23
Cited 64

A theory of the learnable

Communications of the ACM
Bayesian regularization and pruning using a Laplace prior

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
Bayesian Classification With Gaussian Processes

IEEE Transactions on Pattern Analysis and Machine Intelligence
Atomic Decomposition by Basis Pursuit

SIAM Journal on Scientific Computing
Some PAC-Bayesian Theorems

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
AI Game Programming Wisdom

AI Game Programming Wisdom
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Sparse on-line Gaussian processes

Neural Computation
Joint classifier and feature optimization for cancer diagnosis using gene expression data

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Generalisation Error Bounds for Sparse Linear Classifiers

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Adaptive Sparseness for Supervised Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Pac-bayesian generalisation error bounds for gaussian process classification

The Journal of Machine Learning Research
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Generalization error bounds for Bayesian mixture algorithms

The Journal of Machine Learning Research
Predictive automatic relevance determination by expectation propagation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Bayesian Approach to Joint Feature Selection and Classifier Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Overcomplete Representations

Neural Computation

Multiclass sparse logistic regression for classification of multiple cancer types using gene expression data

Computational Statistics & Data Analysis
Learning iteratively a classifier with the Bayesian Model Averaging Principle

Pattern Recognition
Automatic covariate selection in logistic models for chest pain diagnosis: A new approach

Computer Methods and Programs in Biomedicine
Classification of proteomic data with multiclass Logistic Partial Least Squares algorithm

International Journal of Bioinformatics Research and Applications
Similarity based smoothing in language modeling

Acta Cybernetica
Algorithms for Sparse Linear Classifiers in the Massive Data Setting

The Journal of Machine Learning Research
Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields

International Journal of Computer Vision
Bayesian Hyperspectral Image Segmentation with Discriminative Class Learning

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Parsimonious Kernel Fisher Discrimination

IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Learning with Lq

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Tracking and recognizing actions of multiple hockey players using the boosted particle filter

Image and Vision Computing
Prediction of Transcription Factor Families Using DNA Sequence Features

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
Fault diagnosis of low speed bearing based on relevance vector machine and support vector machine

Expert Systems with Applications: An International Journal
Sparse multinomial kernel discriminant analysis (sMKDA)

Pattern Recognition
Handwritten word-spotting using hidden Markov models and universal vocabularies

Pattern Recognition
Large-scale sparse logistic regression

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Logistic online learning methods and their application to incremental dependency parsing

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Restart Strategy Selection Using Machine Learning Techniques

SAT '09 Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing
2009 Special Issue: Time series modeling by a regression approach based on a latent process

Neural Networks
A method for large-scale l1-regularized logistic regression

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
SATzilla: portfolio-based algorithm selection for SAT

Journal of Artificial Intelligence Research
Learning classifiers when the training data is not IID

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Sequence prediction exploiting similarity information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Logistic regression models for a fast CBIR method based on feature selection

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
2009 Special Issue: Selecting features for BCI control based on a covert spatial attention paradigm

Neural Networks
Gene identification and survival prediction with Lp Cox regression and novel similarity measure

International Journal of Data Mining and Bioinformatics
A regression model with a hidden logistic process for feature extraction from time series

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A novel Bayesian logistic discriminant model: An application to face recognition

Pattern Recognition
Point process models for spotting keywords in continuous speech

IEEE Transactions on Audio, Speech, and Language Processing
A hidden process regression model for functional data description. Application to curve discrimination

Neurocomputing
A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression

The Journal of Machine Learning Research
Multiplicative updates for L1-regularized linear and logistic regression

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Hierarchical hardness models for SAT

CP'07 Proceedings of the 13th international conference on Principles and practice of constraint programming
SATzilla-07: the design and analysis of an algorithm portfolio for SAT

CP'07 Proceedings of the 13th international conference on Principles and practice of constraint programming
Efficient learning and feature selection in high-dimensional regression

Neural Computation
A bag of notes approach to writer identification in old handwritten musical scores

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Identifying emotions, intentions, and attitudes in text using a game with a purpose

CAAGET '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text
Learning conditional random fields for classification of hyperspectral images

IEEE Transactions on Image Processing
A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification

The Journal of Machine Learning Research
Bayesian kernel projections for classification of high dimensional data

Statistics and Computing
Improving accuracy of microarray classification by a simple multi-task feature selection filter

International Journal of Data Mining and Bioinformatics
Machine learning approaches for high-resolution urban land cover classification: a comparative study

Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications
Neural networks letter: Evolutionary q-Gaussian radial basis function neural networks for multiclassification

Neural Networks
An Efficient Approach to Semantic Segmentation

International Journal of Computer Vision
Images as sets of locally weighted features

Computer Vision and Image Understanding
Maximum entropy distribution estimation with generalized regularization

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Fast sparse multinomial regression applied to hyperspectral data

ICIAR'06 Proceedings of the Third international conference on Image Analysis and Recognition - Volume Part II
Face Recognition from Caption-Based Supervision

International Journal of Computer Vision
A supervised clustering approach for fMRI-based inference of brain states

Pattern Recognition
Short communication: On estimating simple probabilistic discriminative models with subclasses

Expert Systems with Applications: An International Journal
Bayesian image segmentation using gaussian field priors

EMMCVPR'05 Proceedings of the 5th international conference on Energy Minimization Methods in Computer Vision and Pattern Recognition
Optimization with Sparsity-Inducing Penalties

Foundations and Trends® in Machine Learning
Editors Choice Article: I2VM: Incremental import vector machines

Image and Vision Computing
A new approach to a maximum a posteriori-based kernel classification method

Neural Networks
Efficient feature selection filters for high-dimensional data

Pattern Recognition Letters
Classification of multicolor fluorescence in-situ hybridization (M-FISH) image using regularized multinomial logistic regression

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Accurate Prediction of Coronary Artery Disease Using Reliable Diagnosis System

Journal of Medical Systems
Unsupervised classification of SAR images using hierarchical agglomeration and EM

MUSCLE'11 Proceedings of the 2011 international conference on Computational Intelligence for Multimedia Understanding
Joint sparsity-based robust multimodal biometrics recognition

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Dynamic learning of SCRF for feature selection and classification of hyperspectral imagery

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Gaussian multiple instance learning approach for mapping the slums of the world using very high resolution imagery

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
The Receiver Operational Characteristic for Binary Classification with Multiple Indices and Its Application to the Neuroimaging Study of Alzheimer's Disease

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Discriminative fusion of shape and appearance features for human pose estimation

Pattern Recognition
Multinomial logit models with implicit variable selection

Advances in Data Analysis and Classification

Quantified Score

Hi-index	0.15

Visualization

Abstract

Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmarkdata sets attest to the accuracy, sparsity, and efficiency of the proposed methods.