Randomized Algorithms for Matrices and Data

Authors:
Michael W. Mahoney
Affiliations:
-
Venue:
Foundations and Trends® in Machine Learning
Year:
2011

Citing 72
Cited 7

A fast algorithm for particle simulations

Journal of Computational Physics
The Johnson-Lindenstrauss Lemma and the sphericity of some graphs

Journal of Combinatorial Theory Series A
Sensitivity analysis in linear regression

Sensitivity analysis in linear regression
Computing truncated singular value decomposition least squares solutions by rank revealing QR-factorizations

SIAM Journal on Scientific and Statistical Computing
Structure-preserving and rank-revealing QR-factorizations

SIAM Journal on Scientific and Statistical Computing
On Rank-Revealing Factorisations

SIAM Journal on Matrix Analysis and Applications
Randomized algorithms

Randomized algorithms
Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM Journal on Scientific Computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Computing rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Algorithm 782: codes for rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Algorithm 583: LSQR: Sparse Linear Equations and Least Squares Problems

ACM Transactions on Mathematical Software (TOMS)
Latent semantic indexing: a probabilistic analysis

Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
An elementary proof of a theorem of Johnson and Lindenstrauss

Random Structures & Algorithms
Mining knowledge-sharing sites for viral marketing

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Massive datasets in astronomy

Handbook of massive data sets
Database-friendly random projections: Johnson-Lindenstrauss with binary coins

Journal of Computer and System Sciences - Special issu on PODS 2001
Experiments with random projections for machine learning

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Construction and arithmetics of H-matrices

Computing
Clustering Large Graphs via the Singular Value Decomposition

Machine Learning
Fast monte-carlo algorithms for finding low-rank approximations

Journal of the ACM (JACM)
Sampling algorithms for l2 regression and applications

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Email Surveillance Using Non-negative Matrix Factorization

Computational & Mathematical Organization Theory
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix

SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

SIAM Journal on Computing
Tensor-CUR decompositions for tensor-based data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research
Fast computation of low-rank matrix approximations

Journal of the ACM (JACM)
Sampling from large matrices: An approach through geometric functional analysis

Journal of the ACM (JACM)
An estimator for the diagonal of a matrix

Applied Numerical Mathematics
Fast Directional Multilevel Algorithms for Oscillatory Kernels

SIAM Journal on Scientific Computing
Fast Computation of Fourier Integral Operators

SIAM Journal on Scientific Computing
Fast dimension reduction using Rademacher series on dual BCH codes

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Graph sparsification by effective resistances

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Improved Nyström low-rank approximation and error analysis

Proceedings of the 25th international conference on Machine learning
On variants of the Johnson–Lindenstrauss lemma

Random Structures & Algorithms
Unsupervised feature selection for principal components analysis

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Colibri: fast mining of large static and dynamic graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Dense Fast Random Projections and Lean Walsh Transforms

APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Relative-Error $CUR$ Matrix Decompositions

SIAM Journal on Matrix Analysis and Applications
An improved approximation algorithm for the column subset selection problem

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Deterministic Sparse Column Based Matrix Reconstruction via Greedy Approximation of SVD

ISAAC '08 Proceedings of the 19th International Symposium on Algorithms and Computation
Numerical linear algebra in the streaming model

Proceedings of the forty-first annual ACM symposium on Theory of computing
A fast and efficient algorithm for low-rank approximation of a matrix

Proceedings of the forty-first annual ACM symposium on Theory of computing
Graph spectra as a systematic tool in computational biology

Discrete Applied Mathematics
On sampling-based approximate spectral decomposition

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Matrix Factorization Techniques for Recommender Systems

Computer
On selecting a maximum volume sub-matrix of a matrix and related problems

Theoretical Computer Science
Faster dimension reduction

Communications of the ACM
The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors

SIAM Journal on Computing
Exact Matrix Completion via Convex Optimization

Foundations of Computational Mathematics
A Randomized Algorithm for Principal Component Analysis

SIAM Journal on Matrix Analysis and Applications
A sparse Johnson: Lindenstrauss transform

Proceedings of the forty-second ACM symposium on Theory of computing
Clustered Nyström method for large scale manifold learning and dimension reduction

IEEE Transactions on Neural Networks
Numerical Methods for Electronic Structure Calculations of Materials

SIAM Review
Blendenpik: Supercharging LAPACK's Least-Squares Solver

SIAM Journal on Scientific Computing
SelInv---An Algorithm for Selected Inversion of a Sparse Symmetric Matrix

ACM Transactions on Mathematical Software (TOMS)
Faster least squares approximation

Numerische Mathematik
Fast construction of hierarchical matrix representation from matrix-vector multiplication

Journal of Computational Physics
A Fast Randomized Algorithm for Orthogonal Projection

SIAM Journal on Scientific Computing
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Importance Sampling for a Monte Carlo Matrix Multiplication Algorithm, with Application to Information Retrieval

SIAM Journal on Scientific Computing
An almost optimal unrestricted fast Johnson-Lindenstrauss transform

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Low rank matrix-valued chernoff bounds and approximate matrix multiplication

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation

PCA, eigenvector localization and clustering for side-channel attacks on cryptographic hardware devices

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Towards large scale continuous EDA: a random matrix theory perspective

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Fast approximation of matrix coherence and statistical leverage

The Journal of Machine Learning Research
A scalable approach to column-based low-rank matrix approximation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling

The Journal of Machine Learning Research
Large-scale SVD and manifold learning

The Journal of Machine Learning Research
Column Subset Selection Problem is UG-hard

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and this work was performed by individuals from many different research communities. While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst-case asymptotic theory and/or numerical implementation, there are numerous other benefits that are at least as important. For example, the use of randomization can lead to simpler algorithms that are easier to analyze or reason about when applied in counterintuitive settings; it can lead to algorithms with more interpretable output, which is of interest in applications where analyst time rather than just computational time is of interest; it can lead implicitly to regularization and more robust output; and randomized algorithms can often be organized to exploit modern computational architectures better than classical numerical methods. This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. Throughout this review, an emphasis will be placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these tools in large-scale data applications. Crucial in this context is the connection with the concept of statistical leverage. This concept has long been used in statistical regression diagnostics to identify outliers; and it has recently proved crucial in the development of improved worst-case matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists. This connection arises naturally when one explicitly decouples the effect of randomization in these matrix algorithms from the underlying linear algebraic structure. This decoupling also permits much finer control in the application of randomization, as well as the easier exploitation of domain knowledge. Most of the review will focus on random sampling algorithms and random projection algorithms for versions of the linear least-squares problem and the low-rank matrix approximation problem. These two problems are fundamental in theory and ubiquitous in practice. Randomized methods solve these problems by constructing and operating on a randomized sketch of the input matrix A — for random sampling methods, the sketch consists of a small number of carefully-sampled and rescaled columns/rows of A, while for random projection methods, the sketch consists of a small number of linear combinations of the columns/rows of A. Depending on the specifics of the situation, when compared with the best previously-existing deterministic algorithms, the resulting randomized algorithms have worst-case running time that is asymptotically faster; their numerical implementations are faster in terms of clock-time; or they can be implemented in parallel computing environments where existing numerical algorithms fail to run at all. Numerous examples illustrating these observations will be described in detail.