Deterministic wavelet thresholding for maximum-error metrics

Authors:
Minos Garofalakis;Amit Kumar
Affiliations:
Bell Laboratories, Murray Hill, NJ;Indian Institute of Technology, New Delhi, India
Venue:
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2004

Citing 17
Cited 35

Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
An overview of wavelet based multiresolution analyses

SIAM Review
Randomized algorithms

Randomized algorithms
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
WALRUS: a similarity retrieval algorithm for image databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Wavelet synopses with error guarantees

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Extended wavelets for multiple measures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Probabilistic wavelet synopses

ACM Transactions on Database Systems (TODS)

Wavelet synopsis for data streams: minimizing non-euclidean error

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space efficiency in synopsis construction algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Approximation algorithms for wavelet transform coding of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Wavelet synopses for general error metrics

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Compact histograms for hierarchical identifiers

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
A study on workload-aware wavelet synopses for point and range-sum queries

DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Optimal workload-based weighted wavelet synopses

Theoretical Computer Science
A Note on Linear Time Algorithms for Maximum Error Histograms

IEEE Transactions on Knowledge and Data Engineering
Inner-product based wavelet synopses for range-sum queries

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error

IEEE Transactions on Knowledge and Data Engineering
DAWN: an efficient framework of DCT for data with error estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Wavelet synopsis for hierarchical range queries with workloads

The VLDB Journal — The International Journal on Very Large Data Bases
On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

The VLDB Journal — The International Journal on Very Large Data Bases
Unrestricted wavelet synopses under maximum error bound

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
LCS-Hist: taming massive high-dimensional data cube compression

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On Multidimensional Wavelet Synopses for Maximum Error Bounds

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Hierarchically compressed wavelet synopses

The VLDB Journal — The International Journal on Very Large Data Bases
AMID: Approximation of MultI-measured Data using SVD

Information Sciences: an International Journal
GAMPS: compressing multi sensor data by grouping and amplitude scaling

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A wavelet transform for efficient consolidation of sensor relations with quality guarantees

Proceedings of the VLDB Endowment
Building data synopses within a known maximum error bound

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Top-down compression of data cubes in the presence of simultaneous multiple hierarchical range queries

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Effective processing of continuous group-by aggregate queries in sensor networks

Journal of Systems and Software
Synopses for probabilistic data over large domains

Proceedings of the 14th International Conference on Extending Database Technology
Location-aware type ahead search on spatial databases: semantics and efficiency

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Fast approximate wavelet tracking on streams

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Optimal workload-based weighted wavelet synopses

ICDT'05 Proceedings of the 10th international conference on Database Theory
Subquadratic algorithms for workload-aware haar wavelet synopses

FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Constructing optimal wavelet synopses

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
An adaptive algorithm for online time series segmentation with error bound guarantee

Proceedings of the 15th International Conference on Extending Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several studies have demonstrated the effectiveness of the wavelet, decomposition as a tool for reducing large amounts of data down to compact, wavelet synopses that can be used to obtain fast, accurate approximate answers to user queries. While conventional wavelet synopses are based on greedily minimizing the overall root-mean-squared (i.e., L2-norm) error in the data approximation, recent work has demonstrated that such synopses can suffer from important problems, including severe bias and wide variance in the quality of the data reconstruction, and lack of non-trivial guarantees for individual approximate answers. As a result, probabilistic thresholding schemes have been recently proposed as a means of building wavelet synopses that try to probabilistically control other approximation-error metrics, such as the maximum relative error in data-value reconstruction, which is arguably the most important for approximate query answers and meaningful error guarantees.One of the main open problems posed by this earlier work is whether it is possible to design efficient deterministic wavelet-thresholding algorithms for minimizing non-L2 error metrics that are relevant to approximate query processing systems, such as maximum relative or maximum absolute error. Obviously, such algorithms can guarantee better wavelet synopses and avoid the pitfalls of probabilistic techniques (e.g., "bad" coin-flip sequences) leading to poor solutions. In this paper, we address this problem and propose novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing maximum-error metrics. We introduce an optimal low polynomial-time algorithm for one-dimensional wavelet thresholding--our algorithm is based on a new Dynamic-Programming (DP) formulation, and can be employed to minimize the maximum relative or absolute error in the data reconstruction. Unfortunately, directly extending our one-dimensional DP algorithm to multi-dimensional wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, we also introduce novel, polynomial-time approximation schemes (with tunable approximation guarantees for the target maximum-error metric) for deterministic wavelet thresholding in multiple dimensions.