Epsilon-nets and simplex range queries
SCG '86 Proceedings of the second annual symposium on Computational geometry
Approximations and optimal geometric divide-and-conquer
Selected papers of the 23rd annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Improved bounds on the sample complexity of learning
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Algorithms for facility location problems with outliers
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Improved bounds on the sample complexity of learning
Journal of Computer and System Sciences
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Subgradient and sampling algorithms for l1 regression
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling algorithms for l2 regression and applications
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On k-Median clustering in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Improved Approximation Algorithms for Large Matrices via Random Projections
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Coresets forWeighted Facilities and Their Applications
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Sublinear-time approximation algorithms for clustering via random sampling
Random Structures & Algorithms - Proceedings from the 12th International Conference “Random Structures and Algorithms”, August1-5, 2005, Poznan, Poland
Smaller Coresets for k-Median and k-Means Clustering
Discrete & Computational Geometry
Deterministic sampling and range counting in geometric data streams
ACM Transactions on Algorithms (TALG)
A PTAS for k-means clustering based on weak coresets
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Bi-criteria linear-time approximations for generalized k-mean/median/center
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Sampling-based dimension reduction for subspace approximation
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Efficient subspace approximation algorithms
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A constant factor approximation algorithm for k-median clustering with outliers
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling algorithms and coresets for ℓp regression
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Numerical linear algebra in the streaming model
Proceedings of the forty-first annual ACM symposium on Theory of computing
Proceedings of the forty-first annual ACM symposium on Theory of computing
Proceedings of the forty-first annual ACM symposium on Theory of computing
Universal ε-approximators for integrals
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Coresets and sketches for high dimensional subspace approximation problems
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Coresets for discrete integration and clustering
FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
Algorithms and hardness for subspace approximation
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
FSTTCS'04 Proceedings of the 24th international conference on Foundations of Software Technology and Theoretical Computer Science
From high definition image to low space optimization
SSVM'11 Proceedings of the Third international conference on Scale Space and Variational Methods in Computer Vision
On multiplicative λ-approximations and some geometric applications
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
A near-linear algorithm for projective clustering integer points
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Data reduction for weighted and outlier-resistant clustering
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
An effective coreset compression algorithm for large scale sensor networks
Proceedings of the 11th international conference on Information Processing in Sensor Networks
Active clustering of biological sequences
The Journal of Machine Learning Research
Fast k-clustering queries on embeddings of road networks
Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
The single pixel GPS: learning big data signals from tiny coresets
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Learning Big (Image) Data via Coresets for Dictionaries
Journal of Mathematical Imaging and Vision
Hi-index | 0.00 |
Given a set F of n positive functions over a ground set X, we consider the problem of computing x* that minimizes the expression ∑f ∈ Ff(x), over x ∈ X. A typical application is shape fitting, where we wish to approximate a set P of n elements (say, points) by a shape x from a (possibly infinite) family X of shapes. Here, each point p ∈ P corresponds to a function f such that f(x) is the distance from p to x, and we seek a shape x that minimizes the sum of distances from each point in P. In the k-clustering variant, each x\in X is a tuple of k shapes, and f(x) is the distance from p to its closest shape in x. Our main result is a unified framework for constructing coresets and approximate clustering for such general sets of functions. To achieve our results, we forge a link between the classic and well defined notion of ε-approximations from the theory of PAC Learning and VC dimension, to the relatively new (and not so consistent) paradigm of coresets, which are some kind of "compressed representation" of the input set F. Using traditional techniques, a coreset usually implies an LTAS (linear time approximation scheme) for the corresponding optimization problem, which can be computed in parallel, via one pass over the data, and using only polylogarithmic space (i.e, in the streaming model). For several function families F for which coresets are known not to exist, or the corresponding (approximate) optimization problems are hard, our framework yields bicriteria approximations, or coresets that are large, but contained in a low-dimensional space. We demonstrate our unified framework by applying it on projective clustering problems. We obtain new coreset constructions and significantly smaller coresets, over the ones that appeared in the literature during the past years, for problems such as: k-Median [Har-Peled and Mazumdar,STOC'04], [Chen, SODA'06], [Langberg and Schulman, SODA'10]; k-Line median [Feldman, Fiat and Sharir, FOCS'06], [Deshpande and Varadarajan, STOC'07]; Projective clustering [Deshpande et al., SODA'06] [Deshpande and Varadarajan, STOC'07]; Linear lp regression [Clarkson, Woodruff, STOC'09 ]; Low-rank approximation [Sarlos, FOCS'06]; Subspace approximation [Shyamalkumar and Varadarajan, SODA'07], [Feldman, Monemizadeh, Sohler and Woodruff, SODA'10], [Deshpande, Tulsiani, and Vishnoi, SODA'11]. The running times of the corresponding optimization problems are also significantly improved. We show how to generalize the results of our framework for squared distances (as in k-mean), distances to the qth power, and deterministic constructions.