Matrix approximation and projective clustering via volume sampling

Authors:
Amit Deshpande;Luis Rademacher;Santosh Vempala;Grant Wang
Affiliations:
CSAIL, MIT;CSAIL, MIT;CSAIL, MIT;CSAIL, MIT
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 19
Cited 23

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Clustering in large graphs and matrices

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast computation of low rank matrix approximations

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Polynomial-time approximation schemes for geometric min-sum median clustering

Journal of the ACM (JACM)
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Projective clustering in high dimensions using core-sets

Proceedings of the eighteenth annual symposium on Computational geometry
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Pass efficient algorithms for approximating large matrices

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Approximation Algorithms for k-Line Center

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Sampling lower bounds via information theory

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A decentralized algorithm for spectral analysis

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Fast monte-carlo algorithms for finding low-rank approximations

Journal of the ACM (JACM)
k-means projective clustering

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

How to get close to the median shape

Proceedings of the twenty-second annual symposium on Computational geometry
How to get close to the median shape

Computational Geometry: Theory and Applications - Special issue on the 21st European workshop on computational geometry (EWCG 2005)
Bi-criteria linear-time approximations for generalized k-mean/median/center

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Sampling-based dimension reduction for subspace approximation

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Subspace sampling and relative-error matrix approximation: column-row-based methods

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Efficient subspace approximation algorithms

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Unsupervised feature selection for principal components analysis

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustered subset selection and its applications on it service metrics

Proceedings of the 17th ACM conference on Information and knowledge management
An improved approximation algorithm for the column subset selection problem

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Approximate Spectral Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On sampling-based approximate spectral decomposition

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
On selecting a maximum volume sub-matrix of a matrix and related problems

Theoretical Computer Science
A Randomized Algorithm for Principal Component Analysis

SIAM Journal on Matrix Analysis and Applications
A large-scale manifold learning approach for brain tumor progression prediction

MLMI'11 Proceedings of the Second international conference on Machine learning in medical imaging
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Low rank matrix-valued chernoff bounds and approximate matrix multiplication

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Subspace sampling and relative-error matrix approximation: column-based methods

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Sampling methods for the Nyström method

The Journal of Machine Learning Research
Low rank approximation and regression in input sparsity time

Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Reduced heteroscedasticity linear regression for Nyström approximation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Large-scale SVD and manifold learning

The Journal of Machine Learning Research
Matrix Recipes for Hard Thresholding Methods

Journal of Mathematical Imaging and Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

Frieze et al. [17] proved that a small sample of rows of a given matrix A contains a low-rank approximation D that minimizes ||A - D||F to within small additive error, and the sampling can be done efficiently using just two passes over the matrix [12]. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner. Using this result, we give a pass-efficient algorithm for computing low-rank approximation with reduced additive error. Our second result is that using a natural distribution on subsets of rows (called volume sampling), there exists a subset of k rows whose span contains a factor (k + 1) relative approximation and a subset of k + k(k + 1)/ε rows whose span contains a 1+ε relative approximation. The existence of such a small certificate for multiplicative low-rank approximation leads to a PTAS for the following projective clustering problem: Given a set of points P in Rd, and integers k, j, find a set of j subspaces F1, . . ., Fj, each of dimension at most k, that minimize Σp∈Pmini d(p, Fi)2.