Sparse solutions for linear prediction problems

  • Authors:
  • Dennis Shasha;Tyler Neylon

  • Affiliations:
  • New York University;New York University

  • Venue:
  • Sparse solutions for linear prediction problems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The simplicity of an idea has long been regarded as a sign of elegance and, when shown to coincide with accuracy, a hallmark of profundity. In this thesis our ideas are vectors used as predictors, and sparsity is our measure of simplicity. A vector is sparse when it has few nonzero elements. We begin by asking the question: given a matrix of n time series (vectors which evolve in a "sliding" manner over time) as columns, what are the simplest linear identities among them? Under basic learning assumptions, we justify that such simple identities are likely to persist in the future. It is easily seen that our question is akin to finding sparse vectors in the null space of this matrix. Hence we are confronted with the problem of finding an optimally sparse basis for any vector space. This is a computationally challenging problem with many promising applications, such as iterative numerical optimization, fast dimensionality reduction, graph algorithms on cycle spaces, and of course the time series work of this thesis. In part I, we give a brief exposition of the questions to be addressed here: finding linear identities among time series, and asking how we may bound the generalization error by using sparse vectors as hypotheses in the machine learning versions of these problems. In part II, we focus on the theoretical justification for maximizing sparsity as a means of learning or prediction. We'll look at sample compression schemes as a means of correlating sparsity with the capacity of a hypothesis set, as well as examining learning error bounds which support sparsity. Finally, in part III, we'll illustrate an increasingly sophisticated toolkit of incremental algorithms for discovering sparse patterns among evolving time series.