Sampling algorithms and coresets for ℓp regression

  • Authors:
  • Anirban Dasgupta;Petros Drineas;Boulos Harb;Ravi Kumar;Michael W. Mahoney

  • Affiliations:
  • Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Computer Science, University of Pennsylvania, Philadelphia, PA;Yahoo! Research, Sunnyvale, CA;Computer Science, Rensselaer Polytechnic Institute, Troy, NY

  • Venue:
  • Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ℓp regression problem takes as input a matrix A ∈ ℝn, a vector b ∈ ℝn, and a number p ∈ [1, ∞), and it returns as output a number Z and a vector xOPT ∈ ℝd such that Z = minx∈ℝd ||Ax - b||p = ||AxOPT - b||p. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (n ≫ d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm non-uniformly samples &rcirc;1 = O(36pdmax{p/2+1, p}+1) rows of A and the corresponding elements of b, and then it solves the lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample &rcirc;1/ε2 constraints, and then it solves the lp regression problem on the new sample; we prove this is a (1 + ε)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of ℓp regression, namely p = 1,2 [10, 13]. In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.