On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

Authors:
Sudipto Guha
Affiliations:
Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, USA 19104
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2008

Citing 44
Cited 10

A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Ten lectures on wavelets

Ten lectures on wavelets
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Optimal histograms for hierarchical range queries (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cache Memories

ACM Computing Surveys (CSUR)
A linear space algorithm for computing maximal common subsequences

Communications of the ACM
The working set model for program behavior

Communications of the ACM
Optimal and approximate computation of summary statistics for range aggregates

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Approximation algorithms

Approximation algorithms
Fast, small-space algorithms for approximate histogram maintenance

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast algorithms for hierarchical range histogram construction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Wavelet synopses with error guarantees

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Rangesum histograms

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What can Hierarchies do for Data Warehouses?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Histogramming Data Streams with Fast Per-Item Processing

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Extended wavelets for multiple measures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The optimization of queries in relational databases

The optimization of queries in relational databases
Probabilistic wavelet synopses

ACM Transactions on Database Systems (TODS)
Deterministic wavelet thresholding for maximum-error metrics

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Wavelet synopsis for data streams: minimizing non-euclidean error

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space efficiency in synopsis construction algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation algorithms for wavelet transform coding of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
An evaluation of buffer management strategies for relational database systems

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XWAVE: optimal and approximate extended wavelets

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Pricing guidance in ad sale negotiations: the PrintAds example

Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising
Fast and effective histogram construction

Proceedings of the 18th ACM conference on Information and knowledge management
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
Approximating Points by a Piecewise Linear Function: I

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Approximating Points by a Piecewise Linear Function: II. Dealing with Outliers

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Approximating sliding windows by cyclic tree-like histograms for efficient range queries

Data & Knowledge Engineering
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synopses construction algorithms have been found to be of interest in query optimization, approximate query answering and mining, and over the last few years several good synopsis construction algorithms have been proposed. These algorithms have mostly focused on the running time of the synopsis construction vis-a-vis the synopsis quality. However the space complexity of synopsis construction algorithms has not been investigated as thoroughly. Many of the optimum synopsis construction algorithms are expensive in space. For some of these algorithms the space required to construct the synopsis is significantly larger than the space required to store the input. These algorithms rely on the fact that they require a smaller "working space" and most of the data can be resident on disc. The large space complexity of synopsis construction algorithms is a handicap in several scenarios. In the case of streaming algorithms, space is a fundamental constraint. In case of offline optimal or approximate algorithms, a better space complexity often makes these algorithms much more attractive by allowing them to run in main memory and not use disc, or alternately allows us to scale to significantly larger problems without running out of space. In this paper, we propose a simple and general technique that reduces space complexity of synopsis construction algorithms. As a consequence we show that the notion of "working space" proposed in these contexts is redundant. This technique can be easily applied to many existing algorithms for synopsis construction problems. We demonstrate the performance benefits of our proposal through experiments on real-life and synthetic data. We believe that our algorithm also generalizes to a broader range of dynamic programs beyond synopsis construction.