Matrix analysis
Clustering in large graphs and matrices
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Fast monte-carlo algorithms for finding low-rank approximations
Journal of the ACM (JACM)
Matrix approximation and projective clustering via volume sampling
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Sampling algorithms for l2 regression and applications
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication
SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix
SIAM Journal on Computing
SIAM Journal on Computing
Efficient subspace approximation algorithms
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Interpretable nonnegative matrix decompositions
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
An improved approximation algorithm for the column subset selection problem
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
A fast and efficient algorithm for low-rank approximation of a matrix
Proceedings of the forty-first annual ACM symposium on Theory of computing
Spectral methods for matrices and tensors
Proceedings of the forty-second ACM symposium on Theory of computing
Column subset selection via sparse approximation of SVD
Theoretical Computer Science
Low rank approximation and regression in input sparsity time
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Column Subset Selection Problem is UG-hard
Journal of Computer and System Sciences
Hi-index | 0.00 |
Given an m ×n matrix A and an integer k less than the rank of A, the “best” rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a “small” (i.e., a low-degree polynomial in k, 1/ε, and log(1/δ)) number of actual columns of A such that ||A–CC+A||F ≤(1+ε) ||A–Ak||F with probability at least 1–δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of “subspace sampling,” so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.