Algorithms for linear and nonlinear approximation of large data

Authors:
Sampath Kannan;Sudipto Guha;Boulos Harb
Affiliations:
University of Pennsylvania;University of Pennsylvania;University of Pennsylvania
Venue:
Algorithms for linear and nonlinear approximation of large data
Year:
2007

Citing 0
Cited 1

Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

A central problem in approximation theory is the concise representation of functions. Given a function or signal described as a vector in high-dimensional space, the goal is to represent it as closely as possible using a linear combination of a small number of (simpler) vectors belonging to a pre-defined dictionary. We develop approximation algorithms for this sparse representation problem under two principal approaches known as linear and nonlinear approximation. The linear approach is equivalent to over-constrained regression. Given f ∈ Rn , an n × B matrix A, and a p-norm, the objective is to find x ∈ RB minimizing ∥Ax - f∥ p. We assume that B is much smaller than n; hence, the resulting problem is over-constrained. The nonlinear approach offers an extra degree of freedom; it allows us to choose the B representation vectors from a larger set. Assuming A ∈ Rn×m describes the dictionary, here we seek x ∈ Rm with B non-zero components that minimizes ∥ Ax - f∥p. By providing a fast, greedy one-pass streaming algorithm, we show that the solution to a prevalent restricted version of the problem of nonlinear approximation using a compactly-supported wavelet basis is a O(log n)-approximation to the optimal (unrestricted) solution for all p-norms, p ∈ [1, ∞]. For the important case of the Haar wavelet basis, we detail a fully polynomial-time approximation scheme for all p ∈ [1, ∞] based on a one-pass dynamic programming algorithm that, for p 1, is also streaming. Under other compactly-supported wavelets, a similar algorithm modified for the given wavelet basis yields a QPTAS. Our algorithms extend to variants of the problem such as adaptive quantization and best-basis selection. For linear over-constrained ℓp regression, we demonstrate the existence of core-sets and present an efficient sampling-based approximation algorithm that computes them for all p ∈ [1, ∞). That is, our algorithm samples a small (independent of n) number of constraints (rows of A and the corresponding elements of f), then solves an ℓp regression problem on only these constraints producing a solution that yields a (1 + ε)-approximation to the original problem. Our algorithm extends to more general and commonly encountered settings such as weighted p-norms, generalized p-norms, and solutions restricted to a convex space.