Sampling algorithms and coresets for ℓp regression

Authors:
Anirban Dasgupta;Petros Drineas;Boulos Harb;Ravi Kumar;Michael W. Mahoney
Affiliations:
Yahoo! Research, Sunnyvale, CA;Yahoo! Research, Sunnyvale, CA;Computer Science, University of Pennsylvania, Philadelphia, PA;Yahoo! Research, Sunnyvale, CA;Computer Science, Rensselaer Polytechnic Institute, Troy, NY
Venue:
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Year:
2008

Citing 11
Cited 7

Smaller core-sets for balls

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Convex Optimization

Convex Optimization
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Using mixture models for collaborative filtering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Approximating extent measures of points

Journal of the ACM (JACM)
Subgradient and sampling algorithms for l1 regression

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling algorithms for l2 regression and applications

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Coresets forWeighted Facilities and Their Applications

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science

Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Dense Fast Random Projections and Lean Walsh Transforms

APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Optimal sampling from sliding windows

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A unified framework for approximating and clustering data

Proceedings of the forty-third annual ACM symposium on Theory of computing
Optimal sampling from sliding windows

Journal of Computer and System Sciences
Algorithms and hardness for subspace approximation

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Learning Big (Image) Data via Coresets for Dictionaries

Journal of Mathematical Imaging and Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ℓp regression problem takes as input a matrix A ∈ ℝn, a vector b ∈ ℝn, and a number p ∈ [1, ∞), and it returns as output a number Z and a vector xOPT ∈ ℝd such that Z = minx∈ℝd ||Ax - b||p = ||AxOPT - b||p. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained (n ≫ d) version of this classical problem, for all p ∈ [1, ∞). The first stage of our algorithm non-uniformly samples &rcirc;1 = O(36pdmax{p/2+1, p}+1) rows of A and the corresponding elements of b, and then it solves the lp regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample &rcirc;1/ε2 constraints, and then it solves the lp regression problem on the new sample; we prove this is a (1 + ε)-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of ℓp regression, namely p = 1,2 [10, 13]. In course of proving our result, we develop two concepts--well-conditioned bases and subspace-preserving sampling--that are of independent interest.