Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

  • Authors:
  • Xiangrui Meng;Michael W. Mahoney

  • Affiliations:
  • LinkedIn Corporation, Mountain View, CA, USA;Stanford University, Stanford, CA, USA

  • Venue:
  • Proceedings of the forty-fifth annual ACM symposium on Theory of computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for common linear algebra problems. We show that, given a matrix A ∈ Rn x d with n d and a p ∈ [1, 2), with a constant probability, we can construct a low-distortion embedding matrix Π ∈ RO(poly(d)) x n that embeds Ap, the lp subspace spanned by A's columns, into (RO(poly(d)), |~cdot~|p); the distortion of our embeddings is only O(poly(d)), and we can compute Π A in O(nnz(A)) time, i.e., input-sparsity time. Our result generalizes the input-sparsity time l2 subspace embedding by Clarkson and Woodruff [STOC'13]; and for completeness, we present a simpler and improved analysis of their construction for l2. These input-sparsity time lp embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as (1 pm ε)-distortion lp subspace embedding and relative-error lp regression. For l2, we show that a (1+ε)-approximate solution to the l2 regression problem specified by the matrix A and a vector b ∈ Rn can be computed in O(nnz(A) + d3 log(d/ε) /ε^2) time; and for lp, via a subspace-preserving sampling procedure, we show that a (1 pm ε)-distortion embedding of Ap into RO(poly(d)) can be computed in O(nnz(A) ⋅ log n) time, and we also show that a (1+ε)-approximate solution to the lp regression problem minx ∈ Rd |A x - b|p can be computed in O(nnz(A) ⋅ log n + poly(d) log(1/ε)/ε2) time. Moreover, we can also improve the embedding dimension or equivalently the sample size to O(d3+p/2 log(1/ε) / ε2) without increasing the complexity.