Probabilistic wavelet synopses

Authors:
Minos Garofalakis;Phillip B. Gibbons
Affiliations:
Bell Labs, Lucent Technologies, Murray Hill, New Jersey;Intel Research, Pittsburgh, Pennsylvania, PA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2004

Citing 16
Cited 29

Randomized rounding: a technique for provably good algorithms and algorithmic proofs

Combinatorica - Theory of Computing
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
An overview of wavelet based multiresolution analyses

SIAM Review
Randomized algorithms

Randomized algorithms
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
WALRUS: a similarity retrieval algorithm for image databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Approximation algorithms

Approximation algorithms
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases

Deterministic wavelet thresholding for maximum-error metrics

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Wavelet synopsis for data streams: minimizing non-euclidean error

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space efficiency in synopsis construction algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation algorithms for wavelet transform coding of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Wavelet synopses for general error metrics

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Extended wavelets for multiple measures

ACM Transactions on Database Systems (TODS)
Quality-Aware Sampling and Its Applications in Incremental Data Mining

IEEE Transactions on Knowledge and Data Engineering
Exploiting duality in summarization with deterministic guarantees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Dissemination of compressed historical information in sensor networks

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error

IEEE Transactions on Knowledge and Data Engineering
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
DAWN: an efficient framework of DCT for data with error estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices

The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads

The VLDB Journal — The International Journal on Very Large Data Bases
Compressed hierarchical binary histograms for summarizing multi-dimensional data

Knowledge and Information Systems
Exploiting Spatio-temporal Correlations for Data Processing in Sensor Networks

GeoSensor Networks
On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

The VLDB Journal — The International Journal on Very Large Data Bases
Approximate lineage for probabilistic databases

Proceedings of the VLDB Endowment
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
AMID: Approximation of MultI-measured Data using SVD

Information Sciences: an International Journal
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
A quad-tree based multiresolution approach for two-dimensional summary data

Information Systems
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Collaborative image compression with error bounds in wireless sensor networks for crop monitoring

Computers and Electronics in Agriculture
Wavelet synopsis: setting unselected coefficients to zero is not optimal

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") that can be used to provide fast and reasonably accurate approximate query answers. A major shortcoming of these existing wavelet techniques is that the quality of the approximate answers they provide varies widely, even for identical queries on nearly identical values in distinct parts of the data. As a result, users have no way of knowing whether a particular approximate answer is highly-accurate or off by many orders of magnitude. In this article, we introduce Probabilistic Wavelet Synopses, the first wavelet-based data reduction technique optimized for guaranteed accuracy of individual approximate answers. Whereas previous approaches rely on deterministic thresholding for selecting the wavelet coefficients to include in the synopsis, our technique is based on a novel, probabilistic thresholding scheme that assigns each coefficient a probability of being included based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis. We show how our scheme avoids the above pitfalls of deterministic thresholding, providing unbiased, highly accurate answers for individual data values in a data vector. We propose several novel optimization algorithms for tuning our probabilistic thresholding scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets evaluate these algorithms, and demonstrate the effectiveness of our probabilistic wavelet synopses in providing fast, highly accurate answers with improved quality guarantees.