Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Pass efficient algorithms for approximating large matrices
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Finding Repeated Elements
Improved Approximation Algorithms for Large Matrices via Random Projections
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Fast computation of low-rank matrix approximations
Journal of the ACM (JACM)
Sampling from large matrices: An approach through geometric functional analysis
Journal of the ACM (JACM)
Relative-Error $CUR$ Matrix Decompositions
SIAM Journal on Matrix Analysis and Applications
An improved approximation algorithm for the column subset selection problem
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Numerical linear algebra in the streaming model
Proceedings of the forty-first annual ACM symposium on Theory of computing
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A sparse Johnson: Lindenstrauss transform
Proceedings of the forty-second ACM symposium on Theory of computing
Faster least squares approximation
Numerische Mathematik
A note on element-wise matrix sparsification via a matrix-valued Bernstein inequality
Information Processing Letters
Near Optimal Column-Based Matrix Reconstruction
FOCS '11 Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
Sparser Johnson-Lindenstrauss transforms
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
A fast random sampling algorithm for sparsifying matrices
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Adaptive sampling and fast low-rank matrix approximation
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Strong converse for identification via quantum channels
IEEE Transactions on Information Theory
Low rank approximation and regression in input sparsity time
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Hi-index | 0.00 |
A sketch of a matrix A is another matrix B which is significantly smaller than A but still approximates it well. Finding such sketches efficiently is an important building block in modern algorithms for approximating, for example, the PCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can only be processed once and storage is severely limited. In this paper we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives n rows of a large matrix A ε ℜ n x m one after the other in a streaming fashion. It maintains a sketch B ℜ l x m containing only l n rows but still guarantees that ATA BTB. More accurately, ∀x || x,||=1 0≤||Ax||2 - ||Bx||2 ≤ 2||A||_f 2 l Or BTB prec ATA and ||ATA - BTB|| ≤ 2 ||A||f2 l. This gives a streaming algorithm whose error decays proportional to 1/l using O(ml) space. For comparison, random-projection, hashing or sampling based algorithms produce convergence bounds proportional to 1/√l. Sketch updates per row in A require amortized O(ml) operations and the algorithm is perfectly parallelizable. Our experiments corroborate the algorithm's scalability and improved convergence rate. The presented algorithm also stands out in that it is deterministic, simple to implement and elementary to prove.