Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

Authors:
Petros Drineas;Ravi Kannan;Michael W. Mahoney
Affiliations:
-;-;-
Venue:
SIAM Journal on Computing
Year:
2006

Citing 0
Cited 39

Sampling algorithms for l2 regression and applications

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Tensor-CUR decompositions for tensor-based data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Randomized algorithms for matrices and massive data sets

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research
Sampling from large matrices: An approach through geometric functional analysis

Journal of the ACM (JACM)
Subspace sampling and relative-error matrix approximation: column-row-based methods

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Sampling subproblems of heterogeneous Max-Cut problems and approximation algorithms

Random Structures & Algorithms
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Real-time automatic tag recommendation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Colibri: fast mining of large static and dynamic graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Numerical linear algebra in the streaming model

Proceedings of the forty-first annual ACM symposium on Theory of computing
On selecting a maximum volume sub-matrix of a matrix and related problems

Theoretical Computer Science
Spectral Algorithms

Foundations and Trends® in Theoretical Computer Science
SSDE: fast graph drawing using sampled spectral distance embedding

GD'06 Proceedings of the 14th international conference on Graph drawing
A secure multiparty computation privacy preserving OLAP framework over distributed XML data

Proceedings of the 2010 ACM Symposium on Applied Computing
Spectral methods for matrices and tensors

Proceedings of the forty-second ACM symposium on Theory of computing
A Randomized Algorithm for Principal Component Analysis

SIAM Journal on Matrix Analysis and Applications
Enhancing Clustering Quality through Landmark-Based Dimensionality Reduction

ACM Transactions on Knowledge Discovery from Data (TKDD)
Fast construction of hierarchical matrix representation from matrix-vector multiplication

Journal of Computational Physics
Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach

Journal of Computer and System Sciences
Larger residuals, less work: active document scheduling for latent dirichlet allocation

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Descriptive matrix factorization for sustainability Adopting the principle of opposites

Data Mining and Knowledge Discovery
Low rank matrix-valued chernoff bounds and approximate matrix multiplication

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Subspace sampling and relative-error matrix approximation: column-based methods

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Low-Rank matrix factorization and co-clustering algorithms for analyzing large data sets

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Non-negative residual matrix factorization: problem definition, fast solutions, and applications

Statistical Analysis and Data Mining
Towards a theory for privacy preserving distributed OLAP

Proceedings of the 2012 Joint EDBT/ICDT Workshops
A Fast Algorithm for Fourier Continuation

SIAM Journal on Scientific Computing
Sampling techniques for monte carlo matrix multiplication with applications to image processing

MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Approximation error in regularized SVD-based Fourier continuations

Applied Numerical Mathematics
Multi-level Low-rank Approximation-based Spectral Clustering for image segmentation

Pattern Recognition Letters
ParCube: sparse parallelizable tensor decompositions

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Low rank approximation and regression in input sparsity time

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Asymptotic error bounds for kernel-based Nyström low-rank approximation matrices

Journal of Multivariate Analysis
A scalable approach to column-based low-rank matrix approximation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling

The Journal of Machine Learning Research
Column Subset Selection Problem is UG-hard

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.01

Visualization

Abstract

In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix $A$ which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time computations on it. Two algorithms are presented which, when given an $m \times n$ matrix $A$, compute approximations to $A$ which are the product of three smaller matrices, $C$, $U$, and $R$, each of which may be computed rapidly. Let $A' = CUR$ be the computed approximate decomposition; both algorithms have provable bounds for the error matrix $A-A'$. In the first algorithm, $c$ columns of $A$ and $r$ rows of $A$ are randomly chosen. If the $m \times c$ matrix $C$ consists of those $c$ columns of $A$ (after appropriate rescaling) and the $r \times n$ matrix $R$ consists of those $r$ rows of $A$ (also after appropriate rescaling), then the $c \times r$ matrix $U$ may be calculated from $C$ and $R$. For any matrix $X$, let $\|X\|_F$ and $\|X\|_2$ denote its Frobenius norm and its spectral norm, respectively. It is proven that $$ \left\|A-A'\right\|_\xi \le \min_{D:\mathrm{rank}(D)\le k} \left\|A-D\right\|_\xi + poly(k,1/c) \left\|A\right\|_F $$ holds in expectation and with high probability for both $\xi = 2,F$ and for all $k=1,\ldots,\mbox{rank}(A)$; thus by appropriate choice of $k$ $$ \left\|A-A'\right\|_2 \le \epsilon \left\|A\right\|_F $$ also holds in expectation and with high probability. This algorithm may be implemented without storing the matrix $A$ in RAM, provided it can make two passes over the matrix stored in external memory and use $O(m+n)$ additional RAM (assuming that $c$ and $r$ are constants, independent of the size of the input). The second algorithm is similar except that it approximates the matrix $C$ by randomly sampling a constant number of rows of $C$. Thus, it has additional error but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank-$k$ approximation) that is at most $\epsilon \|A\|_F$, both algorithms take time which is a low-degree polynomial in $k$, $1/\epsilon$, and $1/\delta$, where $\delta0$ is a failure probability; the first takes time linear in $\mbox{max}(m,n)$ and the second takes time independent of $m$ and $n$. The proofs for the error bounds make important use of matrix perturbation theory and previous work on approximating matrix multiplication and computing low-rank approximations to a matrix. The probability distribution over columns and rows and the rescaling are crucial features of the algorithms and must be chosen judiciously.