Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On the approximation of curves by line segments using dynamic programming
Communications of the ACM
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Approximate query processing using wavelets
The VLDB Journal — The International Journal on Very Large Data Bases
A survey on wavelet applications in data mining
ACM SIGKDD Explorations Newsletter
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Probabilistic wavelet synopses
ACM Transactions on Database Systems (TODS)
SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Wavelet synopsis for data streams: minimizing non-euclidean error
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Space efficiency in synopsis construction algorithms
VLDB '05 Proceedings of the 31st international conference on Very large data bases
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximation algorithms for wavelet transform coding of data streams
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Compact histograms for hierarchical identifiers
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fast approximate wavelet tracking on streams
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Subquadratic algorithms for workload-aware haar wavelet synopses
FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Constructing comprehensive summaries of large event sequences
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Fitting a Step Function to a Point Set
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Tight results for clustering and summarizing data streams
Proceedings of the 12th International Conference on Database Theory
Unrestricted wavelet synopses under maximum error bound
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On Multidimensional Wavelet Synopses for Maximum Error Bounds
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Learning from Data Streams: Synopsis and Change Detection
Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
Constructing comprehensive summaries of large event sequences
ACM Transactions on Knowledge Discovery from Data (TKDD)
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
Approximating Points by a Piecewise Linear Function: I
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Approximating Points by a Piecewise Linear Function: II. Dealing with Outliers
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
An algorithmic approach to event summarization
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A randomized algorithm for weighted approximation of points by a step function
COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I
Synopses for probabilistic data over large domains
Proceedings of the 14th International Conference on Extending Database Technology
Improved points approximation algorithms based on simplicial thickness data structures
IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Monitoring incremental histogram distribution for change detection in data streams
Sensor-KDD'08 Proceedings of the Second international conference on Knowledge Discovery from Sensor Data
Outlier respecting points approximation
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
A deterministic algorithm for fitting a step function to a weighted point-set
Information Processing Letters
A note on searching line arrangements and applications
Information Processing Letters
Hi-index | 0.00 |
Summarization is an important task in data mining. A major challenge over the past years has been the efficient construction of fixed-space synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum-error metric. Histograms and several hierarchical techniques have been proposed for this problem. However, their time and/or space complexities remain impractically high and depend not only on the data set size n, but also on the space budget B. These handicaps stem from a requirement to tabulate all allocations of synopsis space to different regions of the data. In this paper we develop an alternative methodology that dispels these deficiencies, thanks to a fruitful application of the solution to the dual problem: given a maximum allowed error, determine the minimum-space synopsis that achieves it. Compared to the state-of-the-art, our histogram construction algorithm reduces time complexity by (at least) a Blog2n over logε* factor and our hierarchical synopsis algorithm reduces the complexity by (at least) a factor of log2B over logε* + logn in time and B(1-log B over log n) in space, where ε* is the optimal error. These complexity advantages offer both a space-efficiency and a scalability that previous approaches lacked. We verify the benefits of our approach in practice by experimentation.