Leaving the span

Authors:
Manfred K. Warmuth;S. V. N. Vishwanathan
Affiliations:
Computer Science Department, University of California, Santa Cruz, CA;Machine Learning Program, National ICT Australia, Canberra, ACT, Australia
Venue:
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Year:
2005

Citing 16
Cited 6

Matrix analysis

Matrix analysis
The minimum consistent DFA problem cannot be approximated within any polynomial

Journal of the ACM (JACM)
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant

Artificial Intelligence - Special issue on relevance
The robustness of the p-norm algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Linear hinge loss and average margin

Proceedings of the 1998 conference on Advances in neural information processing systems II
Relative Loss Bounds for Multidimensional Regression Problems

Machine Learning
Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Towards Representation Independence in PAC Learning

AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference
A Generalized Representer Theorem

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Limitations of learning via embeddings in euclidean half spaces

The Journal of Machine Learning Research
Path kernels and multiplicative updates

The Journal of Machine Learning Research
Efficiency versus convergence of Boolean kernels for on-line learning algorithms

Journal of Artificial Intelligence Research
Relative loss bounds for single neurons

IEEE Transactions on Neural Networks

Online kernel PCA with entropic matrix updates

Proceedings of the 24th international conference on Machine learning
Learning Permutations with Exponential Weights

The Journal of Machine Learning Research
When is there a free matrix lunch?

COLT'07 Proceedings of the 20th annual conference on Learning theory
An analysis of the anti-learning phenomenon for the class symmetric polyhedron

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Can entropic regularization be replaced by squared euclidean distance plus additional linear constraints

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Kernelization of matrix updates, when and how?

ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the training instances as its weight vector. The hardness holds even if we allow the learner to embed the instances into any higher dimensional feature space (and use a kernel function to define the dot product between the embedded instances). These algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned. Our hardness result is surprising because the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances and after seeing k instances. This algorithm enforces additional constraints on the weights (all must be non-negative and sum to one) and in some cases these constraints alone force the rank of the weight space to grow as fast as 2k.