Simple and deterministic matrix sketching

Authors:
Edo Liberty
Affiliations:
Yahoo! Labs, Haifa, Israel
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 23
Cited 0

Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Pass efficient algorithms for approximating large matrices

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Finding Repeated Elements

Finding Repeated Elements
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Fast computation of low-rank matrix approximations

Journal of the ACM (JACM)
Sampling from large matrices: An approach through geometric functional analysis

Journal of the ACM (JACM)
Relative-Error $CUR$ Matrix Decompositions

SIAM Journal on Matrix Analysis and Applications
An improved approximation algorithm for the column subset selection problem

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Numerical linear algebra in the streaming model

Proceedings of the forty-first annual ACM symposium on Theory of computing
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A sparse Johnson: Lindenstrauss transform

Proceedings of the forty-second ACM symposium on Theory of computing
Faster least squares approximation

Numerische Mathematik
A note on element-wise matrix sparsification via a matrix-valued Bernstein inequality

Information Processing Letters
Near Optimal Column-Based Matrix Reconstruction

FOCS '11 Proceedings of the 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science
Sparser Johnson-Lindenstrauss transforms

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
A fast random sampling algorithm for sparsifying matrices

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Adaptive sampling and fast low-rank matrix approximation

APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Strong converse for identification via quantum channels

IEEE Transactions on Information Theory
Low rank approximation and regression in input sparsity time

Proceedings of the forty-fifth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A sketch of a matrix A is another matrix B which is significantly smaller than A but still approximates it well. Finding such sketches efficiently is an important building block in modern algorithms for approximating, for example, the PCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can only be processed once and storage is severely limited. In this paper we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives n rows of a large matrix A ε ℜ n x m one after the other in a streaming fashion. It maintains a sketch B ℜ l x m containing only l n rows but still guarantees that ATA BTB. More accurately, ∀x || x,||=1 0≤||Ax||2 - ||Bx||2 ≤ 2||A||_f 2 l Or BTB prec ATA and ||ATA - BTB|| ≤ 2 ||A||f2 l. This gives a streaming algorithm whose error decays proportional to 1/l using O(ml) space. For comparison, random-projection, hashing or sampling based algorithms produce convergence bounds proportional to 1/√l. Sketch updates per row in A require amortized O(ml) operations and the algorithm is perfectly parallelizable. Our experiments corroborate the algorithm's scalability and improved convergence rate. The presented algorithm also stands out in that it is deterministic, simple to implement and elementary to prove.