Randomized rounding: a technique for provably good algorithms and algorithmic proofs
Combinatorica - Theory of Computing
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Randomized algorithms
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
WALRUS: a similarity retrieval algorithm for image databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Approximation algorithms
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Wavelet synopsis for data streams: minimizing non-euclidean error
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space efficiency in synopsis construction algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation algorithms for wavelet transform coding of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
Quality-Aware Sampling and Its Applications in Incremental Data Mining
IEEE Transactions on Knowledge and Data Engineering
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Dissemination of compressed historical information in sensor networks
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error
IEEE Transactions on Knowledge and Data Engineering
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
DAWN: an efficient framework of DCT for data with error estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads
The VLDB Journal — The International Journal on Very Large Data Bases
Compressed hierarchical binary histograms for summarizing multi-dimensional data
Knowledge and Information Systems
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate lineage for probabilistic databases
Proceedings of the VLDB Endowment
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
AMID: Approximation of MultI-measured Data using SVD
Information Sciences: an International Journal
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
Journal of Intelligent Information Systems
A quad-tree based multiresolution approach for two-dimensional summary data
Information Systems
Information Sciences: an International Journal
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Collaborative image compression with error bounds in wireless sensor networks for crop monitoring
Computers and Electronics in Agriculture
Wavelet synopsis: setting unselected coefficients to zero is not optimal
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") that can be used to provide fast and reasonably accurate approximate query answers. A major shortcoming of these existing wavelet techniques is that the quality of the approximate answers they provide varies widely, even for identical queries on nearly identical values in distinct parts of the data. As a result, users have no way of knowing whether a particular approximate answer is highly-accurate or off by many orders of magnitude. In this article, we introduce Probabilistic Wavelet Synopses, the first wavelet-based data reduction technique optimized for guaranteed accuracy of individual approximate answers. Whereas previous approaches rely on deterministic thresholding for selecting the wavelet coefficients to include in the synopsis, our technique is based on a novel, probabilistic thresholding scheme that assigns each coefficient a probability of being included based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis. We show how our scheme avoids the above pitfalls of deterministic thresholding, providing unbiased, highly accurate answers for individual data values in a data vector. We propose several novel optimization algorithms for tuning our probabilistic thresholding scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets evaluate these algorithms, and demonstrate the effectiveness of our probabilistic wavelet synopses in providing fast, highly accurate answers with improved quality guarantees.