Spatial query processing in an object-oriented database system
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Caching multidimensional queries using chunks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Multi-dimensional selectivity estimation using compressed histogram information
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
WALRUS: a similarity retrieval algorithm for image databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient Organization of Large Multidimensional Arrays
Proceedings of the Tenth International Conference on Data Engineering
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Fast Approximate Answers to Aggregate Queries on a Data Cube
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Metric for Distributions with Applications to Image Databases
ICCV '98 Proceedings of the Sixth International Conference on Computer Vision
Approximations in Database Systems
ICDT '03 Proceedings of the 9th International Conference on Database Theory
A survey on wavelet applications in data mining
ACM SIGKDD Explorations Newsletter
Dimensions: why do we need a new data handling architecture for sensor networks?
ACM SIGCOMM Computer Communication Review
The design of an acquisitional query processor for sensor networks
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Multi-resolution modeling of large scale scientific simulation data
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An evaluation of multi-resolution storage for sensor networks
Proceedings of the 1st international conference on Embedded networked sensor systems
Data-centric routing and storage in sensor networks
Wireless sensor networks
Fast range query estimation by N-level tree histograms
Data & Knowledge Engineering
TinyDB: an acquisitional query processing system for sensor networks
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Multiresolution storage and search in sensor networks
ACM Transactions on Storage (TOS)
Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Spatio-temporal data reduction with deterministic error bounds
The VLDB Journal — The International Journal on Very Large Data Bases
Extended wavelets for multiple measures
ACM Transactions on Database Systems (TODS)
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A probabilistic model for data cube compression and query approximation
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Hierarchical bin buffering: Online local moments for dynamic external memory arrays
ACM Transactions on Algorithms (TALG)
ZELESSA: an enabler for real-time sensing, analysing and acting on continuous event streams
International Journal of Business Intelligence and Data Mining
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
A Probabilistic Framework for Building Privacy-Preserving Synopses of Multi-dimensional Data
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A First Step Towards Stream Reasoning
Future Internet --- FIS 2008
On Multidimensional Wavelet Synopses for Maximum Error Bounds
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
What Can Formal Concept Analysis Do for Data Warehouses?
ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Optimality and scalability in lattice histogram construction
Proceedings of the VLDB Endowment
Exploiting locality for query processing and compression in scientific databases
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Probabilistic model for accuracy estimation in approximate monodimensional analyses
WSEAS Transactions on Computers
On wavelet decomposition of uncertain time series data sets
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards approximate SQL: infobright's approach
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
A*-tree: a structure for storage and modeling of uncertain multidimensional arrays
Proceedings of the VLDB Endowment
Effective and efficient sampling methods for deep web aggregation queries
Proceedings of the 14th International Conference on Extending Database Technology
Target-based privacy preserving association rule mining
Proceedings of the 2011 ACM Symposium on Applied Computing
Accuracy estimation in approximate query processing
ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Approximate query on historical stream data
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Flexible query answering in data cubes
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Tight bounds on the estimation distance using wavelet
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
An effective coreset compression algorithm for large scale sensor networks
Proceedings of the 11th international conference on Information Processing in Sensor Networks
Metrics for approximate query engine evaluation
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Collaborative image compression with error bounds in wireless sensor networks for crop monitoring
Computers and Electronics in Agriculture
Wavelet synopsis: setting unselected coefficients to zero is not optimal
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Metadata for approximate query answering systems
Advances in Software Engineering
Taming massive distributed datasets: data sampling using bitmap indices
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.00 |
Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data.