Coresets and sketches for high dimensional subspace approximation problems
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Blendenpik: Supercharging LAPACK's Least-Squares Solver
SIAM Journal on Scientific Computing
Subspace embeddings for the L1-norm with applications
Proceedings of the forty-third annual ACM symposium on Theory of computing
A near-linear algorithm for projective clustering integer points
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Low rank approximation and regression in input sparsity time
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
The $\ell_p$ regression problem takes as input a matrix $A\in\mathbb{R}^{n\times d}$, a vector $b\in\mathbb{R}^n$, and a number $p\in[1,\infty)$, and it returns as output a number ${\cal Z}$ and a vector $x_{\text{{\sc opt}}}\in\mathbb{R}^d$ such that ${\cal Z}=\min_{x\in\mathbb{R}^d}\|Ax-b\|_p=\|Ax_{\text{{\sc opt}}}-b\|_p$. In this paper, we construct coresets and obtain an efficient two-stage sampling-based approximation algorithm for the very overconstrained ($n\gg d$) version of this classical problem, for all $p\in[1, \infty)$. The first stage of our algorithm nonuniformly samples $\hat{r}_1=O(36^p d^{\max\{p/2+1,p\}+1})$ rows of $A$ and the corresponding elements of $b$, and then it solves the $\ell_p$ regression problem on the sample; we prove this is an 8-approximation. The second stage of our algorithm uses the output of the first stage to resample $\hat{r}_1/\epsilon^2$ constraints, and then it solves the $\ell_p$ regression problem on the new sample; we prove this is a $(1+\epsilon)$-approximation. Our algorithm unifies, improves upon, and extends the existing algorithms for special cases of $\ell_p$ regression, namely, $p = 1,2$ [K. L. Clarkson, in Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, SIAM, Philadelphia, 2005, pp. 257-266; P. Drineas, M. W. Mahoney, and S. Muthukrishnan, in Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM, New York, SIAM, Philadelphia, 2006, pp. 1127-1136]. In the course of proving our result, we develop two concepts—well-conditioned bases and subspace-preserving sampling—that are of independent interest.