Matrix analysis
The minimum consistent DFA problem cannot be approximated within any polynomial
Journal of the ACM (JACM)
Exponentiated gradient versus gradient descent for linear predictors
Information and Computation
Artificial Intelligence - Special issue on relevance
The robustness of the p-norm algorithms
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Linear hinge loss and average margin
Proceedings of the 1998 conference on Advances in neural information processing systems II
Relative Loss Bounds for Multidimensional Regression Problems
Machine Learning
Towards Representation Independence in PAC Learning
AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference
A Generalized Representer Theorem
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Limitations of learning via embeddings in euclidean half spaces
The Journal of Machine Learning Research
Path kernels and multiplicative updates
The Journal of Machine Learning Research
Efficiency versus convergence of Boolean kernels for on-line learning algorithms
Journal of Artificial Intelligence Research
Relative loss bounds for single neurons
IEEE Transactions on Neural Networks
Online kernel PCA with entropic matrix updates
Proceedings of the 24th international conference on Machine learning
Learning Permutations with Exponential Weights
The Journal of Machine Learning Research
When is there a free matrix lunch?
COLT'07 Proceedings of the 20th annual conference on Learning theory
An analysis of the anti-learning phenomenon for the class symmetric polyhedron
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Kernelization of matrix updates, when and how?
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
Hi-index | 0.00 |
We discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the training instances as its weight vector. The hardness holds even if we allow the learner to embed the instances into any higher dimensional feature space (and use a kernel function to define the dot product between the embedded instances). These algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned. Our hardness result is surprising because the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances and after seeing k instances. This algorithm enforces additional constraints on the weights (all must be non-negative and sum to one) and in some cases these constraints alone force the rank of the weight space to grow as fast as 2k.