Extended wavelets for multiple measures

Authors:
Antonios Deligiannakis;Minos Garofalakis;Nick Roussopoulos
Affiliations:
University of Athens, Athens, Greece;Intel Research, Berkeley, CA;University of Maryland, College Park, MD
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2007

Citing 23
Cited 14

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Introduction to algorithms

Introduction to algorithms
Computer graphics: principles and practice (2nd ed.)

Computer graphics: principles and practice (2nd ed.)
Query optimization for parallel execution

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
An overview of wavelet based multiresolution analyses

SIAM Review
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximation algorithms

Approximation algorithms
ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate query processing using wavelets

The VLDB Journal — The International Journal on Very Large Data Bases
Extended wavelets for multiple measures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Probabilistic wavelet synopses

ACM Transactions on Database Systems (TODS)
Space efficiency in synopsis construction algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Wavelet synopses for general error metrics

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
XWAVE: optimal and approximate extended wavelets

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Optimal workload-based weighted wavelet synopses

ICDT'05 Proceedings of the 10th international conference on Database Theory

Exploiting duality in summarization with deterministic guarantees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Approximate lineage for probabilistic databases

Proceedings of the VLDB Endowment
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hierarchically compressed wavelet synopses

The VLDB Journal — The International Journal on Very Large Data Bases
General Database Statistics Using Entropy Maximization

DBPL '09 Proceedings of the 12th International Symposium on Database Programming Languages
A wavelet transform for efficient consolidation of sensor relations with quality guarantees

Proceedings of the VLDB Endowment
Consistent histograms in the presence of distinct value counts

Proceedings of the VLDB Endowment
Understanding cardinality estimation using entropy maximization

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Understanding cardinality estimation using entropy maximization

ACM Transactions on Database Systems (TODS)
Worst-case optimal join algorithms: [extended abstract]

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Decision support based needs assessment for cancer patients

HIKM '11 Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management - Volume 120
Real time processing of data from patient biodevices

HIKM '11 Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management - Volume 120

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several studies have demonstrated the effectiveness of the Haar wavelet decomposition as a tool for reducing large amounts of data down to compact wavelet synopses that can be used to obtain fast, accurate approximate answers to user queries. Although originally designed for minimizing the overall mean-squared (i.e., L2-norm) error in the data approximation, recently proposed methods also enable the use of Haar wavelets in minimizing other error metrics, such as the relative error in data value reconstruction, which is arguably the most important for approximate query answers. Relatively little attention, however, has been paid to the problem of using wavelet synopses as an approximate query answering tool over complex tabular datasets containing multiple measures, such as those typically found in real-life OLAP applications. Existing decomposition approaches will either operate on each measure individually, or treat all measures as a vector of values and process them simultaneously. As we demonstrate in this article, these existing individual or combined storage approaches for the wavelet coefficients of different measures can easily lead to suboptimal storage utilization, resulting in drastically reduced accuracy for approximate query answers. To address this problem, in this work, we introduce the notion of an extended wavelet coefficient as a flexible, efficient storage method for wavelet coefficients over multimeasure data. We also propose novel algorithms for constructing effective (optimal or near-optimal) extended wavelet-coefficient synopses under a given storage constraint, for both sum-squared error and relative-error norms. Experimental results with both real-life and synthetic datasets validate our approach, demonstrating that our techniques consistently obtain significant gains in approximation accuracy compared to existing solutions.