On selecting a maximum volume sub-matrix of a matrix and related problems

Authors:
Ali Çivril;Malik Magdon-Ismail
Affiliations:
Rensselaer Polytechnic Institute, Computer Science Department, 110 8th Street Troy, NY 12180, USA;Rensselaer Polytechnic Institute, Computer Science Department, 110 8th Street Troy, NY 12180, USA
Venue:
Theoretical Computer Science
Year:
2009

Citing 7
Cited 7

On Rank-Revealing Factorisations

SIAM Journal on Matrix Analysis and Applications
Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM Journal on Scientific Computing
Fast monte-carlo algorithms for finding low-rank approximations

Journal of the ACM (JACM)
Matrix approximation and projective clustering via volume sampling

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix

SIAM Journal on Computing
Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

SIAM Journal on Computing
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation

Computing Multivariate Fekete and Leja Points by Numerical Linear Algebra

SIAM Journal on Numerical Analysis
Compression and direct manipulation of complex blendshape models

Proceedings of the 2011 SIGGRAPH Asia Conference
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
CageR: Cage-Based Reverse Engineering of Animated 3D Shapes

Computer Graphics Forum
Matrix factorization as search

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Column Subset Selection Problem is UG-hard

Journal of Computer and System Sciences

Quantified Score

Hi-index	5.23

Visualization

Abstract

Given a matrix A@?R^m^x^n (n vectors in m dimensions), we consider the problem of selecting a subset of its columns such that its elements are as linearly independent as possible. This notion turned out to be important in low-rank approximations to matrices and rank revealing QR factorizations which have been investigated in the linear algebra community and can be quantified in a few different ways. In this paper, from a complexity theoretic point of view, we propose four related problems in which we try to find a sub-matrix C@?R^m^x^k of a given matrix A@?R^m^x^n such that (i) @s"m"a"x(C) (the largest singular value of C) is minimum, (ii) @s"m"i"n(C) (the smallest singular value of C) is maximum, (iii) @k(C)=@s"m"a"x(C)/@s"m"i"n(C) (the condition number of C) is minimum, and (iv) the volume of the parallelepiped defined by the column vectors of C is maximum. We establish the NP-hardness of these problems and further show that they do not admit PTAS. We then study a natural Greedy heuristic for the maximum volume problem and show that it has approximation ratio 2^-^O^(^k^l^o^g^k^). Our analysis of the Greedy heuristic is tight to within a logarithmic factor in the exponent, which we show by explicitly constructing an instance for which the Greedy heuristic is 2^-^@W^(^k^) from optimal. When A has unit norm columns, a related problem is to select the maximum number of vectors with a given volume. We show that if the optimal solution selects k columns, then Greedy will select @W(k/logk) columns, providing a logk approximation.