Regularization techniques for learning with matrices

Authors:
Sham M. Kakade;Shai Shalev-Shwartz;Ambuj Tewari
Affiliations:
Microsoft Research New England, Cambridge, MA;School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel;Department of Computer Science, The University of Texas at Austin, Austin, TX
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 20
Cited 2

Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
General Convergence Results for Linear Discriminant Updates

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
On the Learnability and Design of Output Codes for Multiclass Problems

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Rademacher and gaussian complexities: risk bounds and structural results

The Journal of Machine Learning Research
The Robustness of the p-Norm Algorithms

Machine Learning
Generalization error bounds for Bayesian mixture algorithms

The Journal of Machine Learning Research
Learning the Kernel Matrix with Semidefinite Programming

The Journal of Machine Learning Research
Feature selection, L1 vs. L2 regularization, and rotational invariance

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Prediction, Learning, and Games

Prediction, Learning, and Games
Bounds for Linear Multi-Task Learning

The Journal of Machine Learning Research
Uncovering shared structures in multiclass classification

Proceedings of the 24th international conference on Machine learning
A primal-dual perspective of online learning algorithms

Machine Learning
Consistency of the Group Lasso and Multiple Kernel Learning

The Journal of Machine Learning Research
Convex multi-task feature learning

Machine Learning
Joint covariate selection and joint subspace selection for multiple classification problems

Statistics and Computing
On Spectral Learning

The Journal of Machine Learning Research
Learning bounds for support vector machines with learned kernels

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Online variance minimization

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Online Learning and Online Convex Optimization

Foundations and Trends® in Machine Learning
Guaranteed classification via regularized similarity learning

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is growing body of learning problems for which it is natural to organize the parameters into a matrix. As a result, it becomes easy to impose sophisticated prior knowledge by appropriately regularizing the parameters under some matrix norm. This work describes and analyzes a systematic method for constructing such matrix-based regularization techniques. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on a known duality phenomenon: a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and multiple kernel learning.