Learning with Lq

Authors:
Ata Kabán;Robert J. Durrant
Affiliations:
School of Computer Science, The University of Birmingham, Birmingham, UK B15 2TT;School of Computer Science, The University of Birmingham, Birmingham, UK B15 2TT
Venue:
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Year:
2008

Citing 9
Cited 0

An introduction to variational methods for graphical models

Learning in graphical models
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Feature Selection via Concave Minimization and Support Vector Machines

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
Use of the zero norm with linear models and kernel methods

The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the use of fractional norms for regularisation in supervised learning from high dimensional data, in conditions of a large number of irrelevant features, focusing on logistic regression. We develop a variational method for parameter estimation, and show an equivalence between two approximations recently proposed in the statistics literature. Building on previous work by A.Ng, we show the fractional norm regularised logistic regression enjoys a sample complexity that grows logarithmically with the data dimensions and polynomially with the number of relevant dimensions. In addition, extensive empirical testing indicates that fractional-norm regularisation is more suitable than L1 in cases when the number of relevant features is very small, and works very well despite a large number of irrelevant features.