Sublinear time approximate clustering

Authors:
Nina Mishra;Dan Oblinger;Leonard Pitt
Affiliations:
Hewlett-Packard Labs, Palo Alto, CA;IBM TJ Watson Labs;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Year:
2001

Citing 16
Cited 37

A theory of the learnable

Communications of the ACM
A unified approach to approximation algorithms for bottleneck problems

Journal of the ACM (JACM)
Occam's razor

Information Processing Letters
Optimal algorithms for approximate clustering

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
On learning Read-k-Satisfy-j DNF

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Incremental clustering and dynamic information retrieval

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A constant-factor approximation algorithm for the k-median problem (extended abstract)

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Sublinear time algorithms for metric space problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Clustering for edge-cost minimization (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Criteria for Polynomial-Time (Conceptual) Clustering

Machine Learning
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Primal-Dual Approximation Algorithms for Metric Facility Location and k-Median Problems

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Testing of clustering

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On clusterings-good, bad and spectral

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science

Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Projective clustering in high dimensions using core-sets

Proceedings of the eighteenth annual symposium on Computational geometry
Search and Classification of High Dimensional Data

APPROX '02 Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization
Exact and Approximate Testing/Correcting of Algebraic Functions: A Survey

Theoretical Aspects of Computer Science, Advanced Lectures [First Summer School on Theoretical Aspects of Computer Science, Tehran, Iran, July 2000]
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Approximation schemes for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Algorithms column: sublinear time algorithms

ACM SIGACT News
Estimating the weight of metric minimum spanning trees in sublinear-time

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Optimal Time Bounds for Approximate Clustering

Machine Learning
A k-Median Algorithm with Running Time Independent of Data Size

Machine Learning
A New Conceptual Clustering Framework

Machine Learning
Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Self-improving algorithms

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
On k-Median clustering in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
A fast k-means implementation using coresets

Proceedings of the twenty-second annual symposium on Computational geometry
Online geometric reconstruction

Proceedings of the twenty-second annual symposium on Computational geometry
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Machine Learning
Quantum clustering algorithms

Proceedings of the 24th international conference on Machine learning
Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms

Theoretical Computer Science
Approximate clustering without the approximation

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Coresets and approximate clustering for Bregman divergences

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Property Testing: A Learning Theory Perspective

Foundations and Trends® in Machine Learning
A sublinear-time approximation scheme for bin packing

Theoretical Computer Science
Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets

Fundamenta Informaticae - Intelligent Data Analysis in Granular Computing
Sublinear-time algorithms

Property testing
Sublinear-time algorithms

Property testing
Online geometric reconstruction

Journal of the ACM (JACM)
Min-sum clustering of protein sequences with limited distance information

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Optimal time bounds for approximate clustering

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Coresets for discrete integration and clustering

FSTTCS'06 Proceedings of the 26th international conference on Foundations of Software Technology and Theoretical Computer Science
A scalable supervised algorithm for dimensionality reduction on streaming data

Information Sciences: an International Journal
Active clustering of biological sequences

The Journal of Machine Learning Research
Sublinear Time Algorithms

SIAM Journal on Discrete Mathematics
Quantum speed-up for unsupervised learning

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is of central importance in a number of disciplines including Machine Learning, Statistics, and Data Mining. This paper has two foci: (1) It describes how existing algorithms for clustering can benefit from simple sampling techniques arising from work in statistics [Pol84]. (2) It motivates and introduces a new model of clustering that is in the spirit of the “PAC (probably approximately correct)” learning model, and gives examples of efficient PAC-clustering algorithms.