A unified approach to approximation algorithms for bottleneck problems
Journal of the ACM (JACM)
Incremental clustering and dynamic information retrieval
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Communication complexity
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Progressive vector transmission
Proceedings of the 7th ACM international symposium on Advances in geographic information systems
Local search heuristic for k-median and facility location problems
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Wavelet synopses with error guarantees
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
Lectures on Discrete Geometry
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Asymmetric k-center is log* n-hard to approximate
Journal of the ACM (JACM)
Space efficiency in synopsis construction algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
A Note on Linear Time Algorithms for Maximum Error Histograms
IEEE Transactions on Knowledge and Data Engineering
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Approximation Algorithms for Wavelet Transform Coding of Data Streams
IEEE Transactions on Information Theory
EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Summarization for geographically distributed data streams
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
ACM Transactions on Database Systems (TODS) - Invited papers issue
Data stream clustering: A survey
ACM Computing Surveys (CSUR)
Streaming with minimum space: An algorithm for covering by two congruent balls
Theoretical Computer Science
Hi-index | 0.01 |
In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of "streamstrapping" where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems.