Fast approximation of matrix coherence and statistical leverage

Authors:
Petros Drineas;Malik Magdon-Ismail;Michael W. Mahoney;David P. Woodruff
Affiliations:
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY;Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY;Department of Mathematics, Stanford University, Stanford, CA;IBM Almaden Research Center, San Jose, CA
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 25
Cited 0

Sensitivity analysis in linear regression

Sensitivity analysis in linear regression
Latent semantic indexing: a probabilistic analysis

Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Mining knowledge-sharing sites for viral marketing

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Sampling algorithms for l2 regression and applications

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

SIAM Journal on Computing
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
An estimator for the diagonal of a matrix

Applied Numerical Mathematics
Fast dimension reduction using Rademacher series on dual BCH codes

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Estimating Hybrid Frequency Moments of Data Streams

FAW '08 Proceedings of the 2nd annual international workshop on Frontiers in Algorithmics
Relative-Error $CUR$ Matrix Decompositions

SIAM Journal on Matrix Analysis and Applications
Sketching and Streaming Entropy via Approximation Theory

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
An improved approximation algorithm for the column subset selection problem

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Low cost high performance uncertainty quantification

Proceedings of the 2nd Workshop on High Performance Computational Finance
The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors

SIAM Journal on Computing
A Randomized Algorithm for Principal Component Analysis

SIAM Journal on Matrix Analysis and Applications
A sparse Johnson: Lindenstrauss transform

Proceedings of the forty-second ACM symposium on Theory of computing
1-pass relative-error Lp-sampling with applications

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Blendenpik: Supercharging LAPACK's Least-Squares Solver

SIAM Journal on Scientific Computing
Faster least squares approximation

Numerische Mathematik
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Near Optimal Column-Based Matrix Reconstruction

FOCS '11 Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n ≈ d, and the extension to streaming environments.