Coresets and sketches for high dimensional subspace approximation problems

Authors:
Dan Feldman;Morteza Monemizadeh;Christian Sohler;David P. Woodruff
Affiliations:
Tel Aviv University, Tel Aviv, Israel;University of Dortmund, Germany;University of Dortmund, Germany;IBM Almaden Research Center, San Jose, CA
Venue:
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Year:
2010

Citing 22
Cited 9

A Linear Algorithm for Generating Random Numbers with a Given Distribution

IEEE Transactions on Software Engineering
An Information Statistics Approach to Data Stream and Communication Complexity

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Laplacian Eigenmaps for dimensionality reduction and data representation

Neural Computation
On coresets for k-means and k-median clustering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Using mixture models for collaborative filtering

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Semi-Supervised Learning on Riemannian Manifolds

Machine Learning
Approximating extent measures of points

Journal of the ACM (JACM)
Characterization of network-wide anomalies in traffic flows

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Fast monte-carlo algorithms for finding low-rank approximations

Journal of the ACM (JACM)
Stable distributions, pseudorandom generators, embeddings, and data stream computation

Journal of the ACM (JACM)
Improved Approximation Algorithms for Large Matrices via Random Projections

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Coresets forWeighted Facilities and Their Applications

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
How to get close to the median shape

Computational Geometry: Theory and Applications - Special issue on the 21st European workshop on computational geometry (EWCG 2005)
A PTAS for k-means clustering based on weak coresets

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Sampling-based dimension reduction for subspace approximation

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Online kernel PCA with entropic matrix updates

Proceedings of the 24th international conference on Machine learning
Efficient subspace approximation algorithms

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Estimators and tail bounds for dimension reduction in lα (0

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Numerical linear algebra in the streaming model

Proceedings of the forty-first annual ACM symposium on Theory of computing
Sampling Algorithms and Coresets for $\ell_p$ Regression

SIAM Journal on Computing
The Data Stream Space Complexity of Cascaded Norms

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
On the exact space complexity of sketching and streaming small norms

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms

Fast Manhattan sketches in data streams

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
1-pass relative-error Lp-sampling with applications

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On the exact space complexity of sketching and streaming small norms

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A unified framework for approximating and clustering data

Proceedings of the forty-third annual ACM symposium on Theory of computing
Near-optimal private approximation protocols via a black box transformation

Proceedings of the forty-third annual ACM symposium on Theory of computing
Subspace embeddings for the L1-norm with applications

Proceedings of the forty-third annual ACM symposium on Theory of computing
Bypassing UGC from some optimal geometric inapproximability results

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
A near-linear algorithm for projective clustering integer points

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Algorithms and hardness for subspace approximation

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of approximating a set P of η points in Rd by a j-dimensional subspace under the lp measure, in which we wish to minimize the sum of lp distances from each point of P to this subspace. More generally, the Fq (lp)-subspace approximation problem asks for a j-subspace that minimizes the sum of qth powers of lp-distances to this subspace, up to a multiplicative factor of (1 + ε). We develop techniques for subspace approximation, regression, and matrix approximation that can be used to deal with massive data sets in high dimensional spaces. In particular, we develop coresets and sketches, i.e. small space representations that approximate the input point set P with respect to the subspace approximation problem. Our results are: • A dimensionality reduction method that can be applied to Fq (lp)-clustering and shape fitting problems, such as those in [8, 15]. • The first strong coreset for F1 (l2)-subspace approximation in high-dimensional spaces, i.e. of size polynomial in the dimension of the space. This coreset approximates the distances to any j-subspace (not just the optimal one). • A (1 + ε)-approximation algorithm for the j-dimensional F1 (l2)-subspace approximation problem with running time ηd(j/ε)O(1) + (η + d)2poly(j/ε). • A streaming algorithm that maintains a coreset for the F1 (l2)-subspace approximation problem and uses a space of d [EQUATION] (weighted) points. • Streaming algorithms for the above problems with bounded precision in the turnstile model, i.e, when coordinates appear in an arbitrary order and undergo multiple updates. We show that bounded precision can lead to further improvements. We extend results of [7] for approximate linear regression, distances to subspace approximation, and optimal rank-j approximation, to error measures other than the Frobenius norm.