Randomized rounding: a technique for provably good algorithms and algorithmic proofs
Combinatorica - Theory of Computing
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Randomized algorithms
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
A survey on wavelet applications in data mining
ACM SIGKDD Explorations Newsletter
Issues in data stream management
ACM SIGMOD Record
Extended wavelets for multiple measures
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Spatio-temporal data reduction with deterministic error bounds
DIALM-POMC '03 Proceedings of the 2003 joint workshop on Foundations of mobile computing
A multi-dimensional histogram for selectivity estimation and fast approximate query answering
CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Approximate Selection Queries over Imprecise Data
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Compressing historical information in sensor networks
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive, unsupervised stream mining
The VLDB Journal — The International Journal on Very Large Data Bases
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Hierarchical binary histograms for summarizing multi-dimensional data
Proceedings of the 2005 ACM symposium on Applied computing
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
On-line data reduction and the quality of history in moving objects databases
MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
Online summarization of dynamic time series data
The VLDB Journal — The International Journal on Very Large Data Bases
Spatio-temporal data reduction with deterministic error bounds
The VLDB Journal — The International Journal on Very Large Data Bases
A study on workload-aware wavelet synopses for point and range-sum queries
DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Optimal workload-based weighted wavelet synopses
Theoretical Computer Science
A Note on Linear Time Algorithms for Maximum Error Histograms
IEEE Transactions on Knowledge and Data Engineering
Inner-product based wavelet synopses for range-sum queries
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Dissemination of compressed historical information in sensor networks
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate Query Processing in Cube Streams
IEEE Transactions on Knowledge and Data Engineering
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adaptive, hands-off stream mining
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
REHIST: relative error histogram construction algorithms
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Time series compressibility and privacy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Histograms based on the minimum description length principle
The VLDB Journal — The International Journal on Very Large Data Bases
DAWN: an efficient framework of DCT for data with error estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads
The VLDB Journal — The International Journal on Very Large Data Bases
A Probabilistic Framework for Building Privacy-Preserving Synopses of Multi-dimensional Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Plot Query Processing with Wavelets
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
The VLDB Journal — The International Journal on Very Large Data Bases
Feature-preserved sampling over streaming data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Tight results for clustering and summarizing data streams
Proceedings of the 12th International Conference on Database Theory
Unrestricted wavelet synopses under maximum error bound
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On Multidimensional Wavelet Synopses for Maximum Error Bounds
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Hierarchically compressed wavelet synopses
The VLDB Journal — The International Journal on Very Large Data Bases
AMID: Approximation of MultI-measured Data using SVD
Information Sciences: an International Journal
Detailed diagnosis in enterprise networks
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Managing massive time series streams with multi-scale compressed trickles
Proceedings of the VLDB Endowment
A wavelet transform for efficient consolidation of sensor relations with quality guarantees
Proceedings of the VLDB Endowment
Beyond average: toward sophisticated sensing with queries
IPSN'03 Proceedings of the 2nd international conference on Information processing in sensor networks
Building data synopses within a known maximum error bound
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Supporting top-k aggregate queries over unequal synopsis on internet traffic streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Effective processing of continuous group-by aggregate queries in sensor networks
Journal of Systems and Software
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Spatial selectivity estimation using compressed histogram information
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Clustering-based histograms for multi-dimensional data
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Optimal workload-based weighted wavelet synopses
ICDT'05 Proceedings of the 10th international conference on Database Theory
Subquadratic algorithms for workload-aware haar wavelet synopses
FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
Adaptively detecting aggregation bursts in data streams
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Constructing optimal wavelet synopses
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
An adaptive algorithm for online time series segmentation with error bound guarantee
Proceedings of the 15th International Conference on Extending Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Collaborative image compression with error bounds in wireless sensor networks for crop monitoring
Computers and Electronics in Agriculture
Wavelet synopsis: setting unselected coefficients to zero is not optimal
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
A new spatio-temporal prediction approach based on aggregate queries
International Journal of Knowledge and Web Intelligence
Hi-index | 0.00 |
Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") that can be used to provide fast and reasonably accurate approximate answers to queries. A major criticism of such techniques is that unlike, for example, random sampling, conventional wavelet synopses do not provide informative error guarantees on the accuracy of individual approximate answers. In fact, as this paper demonstrates, errors can vary widely (without bound) and unpredictably, even for identical queries on nearly-identical values in distinct parts of the data. This lack of error guarantees severely limits the practicality of traditional wavelets as an approximate query-processing tool, because users have no idea of the quality of any particular approximate answer. In this paper, we introduce Probabilistic Wavelet Synopses, the first wavelet-based data reduction technique with guarantees on the accuracy of individual approximate answers. Whereas earlier approaches rely on deterministic thresholding for selecting a set of "good" wavelet coefficients, our technique is based on a novel, probabilistic thresholding scheme that assigns each coefficient a probability of being retained based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis. We show how our scheme avoids the above pitfalls of deterministic thresholding, providing highly-accurate answers for individual data values in a data vector. We propose several novel optimization algorithms for tuning our probabilistic thresholding scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets evaluate these algorithms, and demonstrate the effectiveness of our probabilistic wavelet synopses in providing fast, highly-accurate answers with error guarantees.