An improved approximation algorithm for the column subset selection problem

Authors:
Christos Boutsidis;Michael W. Mahoney;Petros Drineas
Affiliations:
Rensselaer Polytechnic Institute, Troy, NY;Stanford University, Stanford, CA;Rensselaer Polytechnic Institute, Troy, NY
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 19
Cited 11

Sensitivity analysis in linear regression

Sensitivity analysis in linear regression
Some applications of the rank revealing QR factorization

SIAM Journal on Scientific and Statistical Computing
On Rank-Revealing Factorisations

SIAM Journal on Matrix Analysis and Applications
Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM Journal on Scientific Computing
Computing rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Competitive recommendation systems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Ranking a random feature for variable and feature selection

The Journal of Machine Learning Research
Matrix approximation and projective clustering via volume sampling

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Tensor-CUR decompositions for tensor-based data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

The Journal of Machine Learning Research
Sampling from large matrices: An approach through geometric functional analysis

Journal of the ACM (JACM)
Spectral feature selection for supervised and unsupervised learning

Proceedings of the 24th international conference on Machine learning
Unsupervised feature selection for principal components analysis

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Relative-Error $CUR$ Matrix Decompositions

SIAM Journal on Matrix Analysis and Applications
Tensor-CUR Decompositions for Tensor-Based Data

SIAM Journal on Matrix Analysis and Applications
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Subspace sampling and relative-error matrix approximation: column-based methods

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Identifying critical variables of principal components for unsupervised feature selection

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Low rank matrix-valued chernoff bounds and approximate matrix multiplication

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Column subset selection via sparse approximation of SVD

Theoretical Computer Science
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Sampling methods for the Nyström method

The Journal of Machine Learning Research
Simple and deterministic matrix sketching

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Sparse and unique nonnegative matrix factorization through data preprocessing

The Journal of Machine Learning Research
Fast approximation of matrix coherence and statistical leverage

The Journal of Machine Learning Research
A scalable approach to column-based low-rank matrix approximation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A note on sparse least-squares regression

Information Processing Letters
Column Subset Selection Problem is UG-hard

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.02

Visualization

Abstract

We consider the problem of selecting the "best" subset of exactly k columns from an m x n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn2, m2n}) time and returns as output an m x k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the top-k right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m x k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the "best" rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that [EQUATION] with probability at least 0.7. This spectral norm bound improves upon the best previously-existing result (of Gu and Eisenstat [21]) for the spectral norm version of this Column Subset Selection Problem. We also prove that [EQUATION] with the same probability. This Frobenius norm bound is only a factor of √k log k worse than the best previously existing existential result and is roughly O(√k!) better than the best previous algorithmic result (both of Deshpande et al. [11]) for the Frobenius norm version of this Column Subset Selection Problem.