Logistic regression with weight grouping priors

Authors:
M. Korzeń;S. Jaroszewicz;P. Klsk
Affiliations:
-;-;-
Venue:
Computational Statistics & Data Analysis
Year:
2013

Citing 8
Cited 0

Bayesian regularization and pruning using a Laplace prior

Neural Computation
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Gene selection in cancer classification using sparse logistic regression with Bayesian regularization

Bioinformatics
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
An extended variable inclusion and shrinkage algorithm for correlated variables

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.03

Visualization

Abstract

A generalization of the commonly used Maximum Likelihood based learning algorithm for the logistic regression model is considered. It is well known that using the Laplace prior (L^1 penalty) on model coefficients leads to a variable selection effect, when most of the coefficients vanish. It is argued that variable selection is not always desirable; it is often better to group correlated variables together and assign equal weights to them. Two new kinds of a priori distributions over weights are investigated: Gaussian Extremal Mixture (GEM) and Laplacian Extremal Mixture (LEM) which enforce grouping of model coefficients in a manner analogous to L^1 and L^2 regularization. An efficient learning algorithm is presented, which simultaneously finds model weights and the hyperparameters of those priors. Examples are shown in the experimental part where the proposed a priori distributions outperform Gauss and Laplace priors as well as other methods which take coefficient grouping into account, such as the elastic net. Theoretical results on parameter shrinkage and sample complexity are also included.