ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Matrix sparsification and the sparse null space problem
APPROX/RANDOM'10 Proceedings of the 13th international conference on Approximation, and 14 the International conference on Randomization, and combinatorial optimization: algorithms and techniques
Pedestrian detection in images via cascaded L1-norm minimization learning method
Pattern Recognition
A geometric approach to sample compression
The Journal of Machine Learning Research
Efficient regression in metric spaces via approximate lipschitz extension
SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Hi-index | 0.00 |
The simplicity of an idea has long been regarded as a sign of elegance and, when shown to coincide with accuracy, a hallmark of profundity. In this thesis our ideas are vectors used as predictors, and sparsity is our measure of simplicity. A vector is sparse when it has few nonzero elements. We begin by asking the question: given a matrix of n time series (vectors which evolve in a "sliding" manner over time) as columns, what are the simplest linear identities among them? Under basic learning assumptions, we justify that such simple identities are likely to persist in the future. It is easily seen that our question is akin to finding sparse vectors in the null space of this matrix. Hence we are confronted with the problem of finding an optimally sparse basis for any vector space. This is a computationally challenging problem with many promising applications, such as iterative numerical optimization, fast dimensionality reduction, graph algorithms on cycle spaces, and of course the time series work of this thesis. In part I, we give a brief exposition of the questions to be addressed here: finding linear identities among time series, and asking how we may bound the generalization error by using sparse vectors as hypotheses in the machine learning versions of these problems. In part II, we focus on the theoretical justification for maximizing sparsity as a means of learning or prediction. We'll look at sample compression schemes as a means of correlating sparsity with the capacity of a hypothesis set, as well as examining learning error bounds which support sparsity. Finally, in part III, we'll illustrate an increasingly sophisticated toolkit of incremental algorithms for discovering sparse patterns among evolving time series.