Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Optimal histograms for hierarchical range queries (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On the approximation of curves by line segments using dynamic programming
Communications of the ACM
Optimal and approximate computation of summary statistics for range aggregates
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Fast algorithms for hierarchical range histogram construction
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
IEEE Computational Science & Engineering
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Approximations in Database Systems
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ICALP '97 Proceedings of the 24th International Colloquium on Automata, Languages and Programming
Approximate query processing using wavelets
The VLDB Journal — The International Journal on Very Large Data Bases
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Probabilistic wavelet synopses
ACM Transactions on Database Systems (TODS)
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximation algorithms for array partitioning problems
Journal of Algorithms
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Hierarchical binary histograms for summarizing multi-dimensional data
Proceedings of the 2005 ACM symposium on Applied computing
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
A fast approximation scheme for probabilistic wavelet synopses
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Compact histograms for hierarchical identifiers
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A study on workload-aware wavelet synopses for point and range-sum queries
DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Optimal workload-based weighted wavelet synopses
Theoretical Computer Science
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
Inner-product based wavelet synopses for range-sum queries
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Efficient and effective explanation of change in hierarchical summaries
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Wavelet synopsis for hierarchical range queries with workloads
The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases
Lattice Histograms: a Resilient Synopsis Structure
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Workload-optimal histograms on streams
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Fast approximate wavelet tracking on streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Subquadratic algorithms for workload-aware haar wavelet synopses
FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Approximation Algorithms for Wavelet Transform Coding of Data Streams
IEEE Transactions on Information Theory
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Publishing microdata with a robust privacy guarantee
Proceedings of the VLDB Endowment
Finding the minimum number of elements with sum above a threshold
Information Sciences: an International Journal
Hi-index | 0.00 |
Hierarchical synopsis structures offer a viable alternative in terms of efficiency and flexibility in relation to traditional summarization techniques such as histograms. Previous research on such structures has mostly focused on a single model, based on the Haar wavelet decomposition. In previous work, we have introduced a more refined, wavelet-inspired hierarchical index structure for synopsis construction: the Haar+ tree. The chief advantages of this structure are twofold. First, it achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques. Second, thanks to its search space delimitation capacity, Haar+ synopsis construction operates in time linear in the size of the data set for any monotonic distributive error metric. Contemporaneous research has introduced another hierarchical synopsis structure, the compact hierarchical histogram (CHH). In this article, we elaborate on both these structures. First, we formally prove that the CHH, in its default binary-hierarchy form, is a simplified variant of a Haar+ tree. We then focus on the summarization problem, with both these hierarchical synopsis structures, in which an error guarantee expressed by a maximum-error metric is required. We show that this problem is most efficiently solved through its dual, space-minimization counterpart, which can also achieve optimal quality. In this case, there is a benefit to be gained by specializing the algorithm for each structure; hence, our algorithm for optimal-quality maximum-error CHH requires low polynomial time; on the other hand, optimal-quality Haar+ synopses for maximum-error metrics are constructed in exponential time; hence, we also develop a low-polynomial-time approximation scheme for the maximum-error Haar+ case. Furthermore, we extend our approach for both general-error and maximum-error Haar+ synopses to arbitrary dimensionality. In our experimental study, (i) we confirm the theoretically expected superiority of Haar+ synopses over Haar wavelet methods in both construction time and achieved quality for representative error metrics; (ii) we demonstrate that Haar+ synopses are also constructed faster than optimal plain histograms, and, moreover, achieve higher synopsis quality with highly discontinuous data sets; such an advantage of a hierarchical synopsis structure over a histogram had been intuitively expressed, but never experimentally verified; and (iii) we show that Haar+ synopsis quality supersedes that of a CHH.