Sampling algorithms for l2 regression and applications

Authors:
Petros Drineas;Michael W. Mahoney;S. Muthukrishnan
Affiliations:
Rensselaer Polytechnic Institute, Troy, New York;Yahoo Research Labs, Sunnyvale, California;Rutgers University, New Brunswick NJ
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 9
Cited 17

Matrix analysis

Matrix analysis
Communication complexity

Communication complexity
Fast computation of low rank matrix approximations

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Fast monte-carlo algorithms for finding low-rank approximations

Journal of the ACM (JACM)
Subgradient and sampling algorithms for l1 regression

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix

SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

SIAM Journal on Computing

Subspace sampling and relative-error matrix approximation: column-row-based methods

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Feature selection methods for text classification

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient subspace approximation algorithms

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling algorithms and coresets for ℓp regression

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Dense Fast Random Projections and Lean Walsh Transforms

APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Factored value iteration converges

Acta Cybernetica
Faster dimension reduction

Communications of the ACM
Blendenpik: Supercharging LAPACK's Least-Squares Solver

SIAM Journal on Scientific Computing
A unified framework for approximating and clustering data

Proceedings of the forty-third annual ACM symposium on Theory of computing
Acceleration of randomized Kaczmarz method via the Johnson---Lindenstrauss Lemma

Numerical Algorithms
Algorithms and hardness for subspace approximation

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Low rank matrix-valued chernoff bounds and approximate matrix multiplication

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Subspace sampling and relative-error matrix approximation: column-based methods

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Low rank approximation and regression in input sparsity time

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Fast approximation of matrix coherence and statistical leverage

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.02

Visualization

Abstract

We present and analyze a sampling algorithm for the basic linear-algebraic problem of l2 regression. The l2 regression (or least-squares fit) problem takes as input a matrix A ∈ Rn×d (where we assume n ≫ d) and a target vector b ∈ Rn, and it returns as output Z = minx∈Rd |b - Ax|2. Also of interest is xopt = A+b, where A+ is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced l2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both Z and xopt. Applications of this sampling methodology are briefly discussed.