Approximation algorithms for wavelet transform coding of data streams

Authors:
Sudipto Guha;Boulos Harb
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Year:
2006

Citing 19
Cited 13

Ten lectures on wavelets

Ten lectures on wavelets
Fast multiresolution image querying

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal and approximate computation of summary statistics for range aggregates

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Fast, small-space algorithms for approximate histogram maintenance

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
Approximation of functions over redundant dictionaries using coherence

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Histogramming Data Streams with Fast Per-Item Processing

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Probabilistic wavelet synopses

ACM Transactions on Database Systems (TODS)
Deterministic wavelet thresholding for maximum-error metrics

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Wavelet synopsis for data streams: minimizing non-euclidean error

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space efficiency in synopsis construction algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Optimal workload-based weighted wavelet synopses

ICDT'05 Proceedings of the 10th international conference on Database Theory

Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Efficient and effective explanation of change in hierarchical summaries

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting duality in summarization with deterministic guarantees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
On the space---time of optimal, approximate and streaming algorithms for synopsis construction problems

The VLDB Journal — The International Journal on Very Large Data Bases
Unrestricted wavelet synopses under maximum error bound

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On Multidimensional Wavelet Synopses for Maximum Error Bounds

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Hierarchically compressed wavelet synopses

The VLDB Journal — The International Journal on Very Large Data Bases
AMID: Approximation of MultI-measured Data using SVD

Information Sciences: an International Journal
Building data synopses within a known maximum error bound

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Approximating sliding windows by cyclic tree-like histograms for efficient range queries

Data & Knowledge Engineering
Subquadratic algorithms for workload-aware haar wavelet synopses

FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Constructing optimal wavelet synopses

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set of orthonormal basis functions {ψi} and a target function/vector f, the wavelet representation problem is to construct f as a combination of at most B basis vectors to minimize some normed distance between f and f. The problem is well understood if the error is the mean squared error: The largest (ignoring signs) B coefficients of the wavelet expansion should be retained. This strategy follows from the proof of optimality and is not a built-in constraint.The mean squared error, however, is not the optimization criterion in several scenarios. The above easy solution to the wavelet representation problem does not carry over to lp for p ≠ 2, and it turns out that restricting the solution to any subset of coefficients of size B or less is suboptimal compared to the best solution which can choose arbitrary real numbers. Further, all the previous literature on non-l2 errors only considered the Haar system.In this paper we provide the first approximation schemes for the unrestricted optimization problem. We provide a lower bounding technique based on a system of inequalities. We show that a modified greedy algorithm that retains the coefficients of expansion gives a O(log n) true (factor) approximation algorithm for a wide variety of compact wavelet systems, including Haar, Daubechies, Symmlets, Coiflets among others. This vindicates several scaling type algorithms which are used in practice. We subsequently augment the lower bound and give a FPTAS for the Haar system. The same ideas extend to a QPTAS for the more general class of compact wavelets mentioned above. We also consider adaptive quantization problems, which are generalizations of the B-term representations.