Approximate query processing using wavelets

Authors:
Kaushik Chakrabarti;Minos Garofalakis;Rajeev Rastogi;Kyuseok Shim
Affiliations:
University of Illinois, 1304 W. Springfield Ave., Urbana, IL 61801, USA/ e-mail: kaushikc@cs.uiuc.edu;Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA/ e-mail: {minos,rastogi}@bell-labs.com;Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974, USA/ e-mail: {minos,rastogi}@bell-labs.com;KAIST and AITrc, 373-1 Kusong-Dong, Yusong-Gu, Taejon 305-701, Korea/ e-mail: shim@cs.kaist.ac.kr
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2001

Citing 28
Cited 51

Spatial query processing in an object-oriented database system

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
An overview of wavelet based multiresolution analyses

SIAM Review
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Caching multidimensional queries using chunks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Multi-dimensional selectivity estimation using compressed histogram information

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
WALRUS: a similarity retrieval algorithm for image databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Fast Approximate Answers to Aggregate Queries on a Data Cube

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Metric for Distributions with Applications to Image Databases

ICCV '98 Proceedings of the Sixth International Conference on Computer Vision

Approximations in Database Systems

ICDT '03 Proceedings of the 9th International Conference on Database Theory
A survey on wavelet applications in data mining

ACM SIGKDD Explorations Newsletter
Dimensions: why do we need a new data handling architecture for sensor networks?

ACM SIGCOMM Computer Communication Review
The design of an acquisitional query processor for sensor networks

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Multi-resolution modeling of large scale scientific simulation data

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
An evaluation of multi-resolution storage for sensor networks

Proceedings of the 1st international conference on Embedded networked sensor systems
Data-centric routing and storage in sensor networks

Wireless sensor networks
Fast range query estimation by N-level tree histograms

Data & Knowledge Engineering
TinyDB: an acquisitional query processing system for sensor networks

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Multiresolution storage and search in sensor networks

ACM Transactions on Storage (TOS)
Sense & response service architecture (SARESA): an approach towards a real-time business intelligence solution and its use for a fraud detection application

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Wavelet synopses for general error metrics

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Spatio-temporal data reduction with deterministic error bounds

The VLDB Journal — The International Journal on Very Large Data Bases
Extended wavelets for multiple measures

ACM Transactions on Database Systems (TODS)
Exploiting duality in summarization with deterministic guarantees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A probabilistic model for data cube compression and query approximation

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Hierarchical bin buffering: Online local moments for dynamic external memory arrays

ACM Transactions on Algorithms (TALG)
ZELESSA: an enabler for real-time sensing, analysing and acting on continuous event streams

International Journal of Business Intelligence and Data Mining
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Enhancing histograms by tree-like bucket indices

The VLDB Journal — The International Journal on Very Large Data Bases
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
A Probabilistic Framework for Building Privacy-Preserving Synopses of Multi-dimensional Data

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A First Step Towards Stream Reasoning

Future Internet --- FIS 2008
On Multidimensional Wavelet Synopses for Maximum Error Bounds

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
What Can Formal Concept Analysis Do for Data Warehouses?

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Fast and effective histogram construction

Proceedings of the 18th ACM conference on Information and knowledge management
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
Exploiting locality for query processing and compression in scientific databases

Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Probabilistic model for accuracy estimation in approximate monodimensional analyses

WSEAS Transactions on Computers
On wavelet decomposition of uncertain time series data sets

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards approximate SQL: infobright's approach

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
A*-tree: a structure for storage and modeling of uncertain multidimensional arrays

Proceedings of the VLDB Endowment
Effective and efficient sampling methods for deep web aggregation queries

Proceedings of the 14th International Conference on Extending Database Technology
Target-based privacy preserving association rule mining

Proceedings of the 2011 ACM Symposium on Applied Computing
Accuracy estimation in approximate query processing

ICCOMP'10 Proceedings of the 14th WSEAS international conference on Computers: part of the 14th WSEAS CSCC multiconference - Volume II
Approximate query on historical stream data

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Comparing data summaries for processing live queries over Linked Data

World Wide Web
Building wavelet histograms on large data in MapReduce

Proceedings of the VLDB Endowment
Flexible query answering in data cubes

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Tight bounds on the estimation distance using wavelet

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
An architecture of a wavelet based approach for the approximate querying of huge sets of data in the telecommunication environment

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
An effective coreset compression algorithm for large scale sensor networks

Proceedings of the 11th international conference on Information Processing in Sensor Networks
Metrics for approximate query engine evaluation

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Collaborative image compression with error bounds in wireless sensor networks for crop monitoring

Computers and Electronics in Agriculture
Wavelet synopsis: setting unselected coefficients to zero is not optimal

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Metadata for approximate query answering systems

Advances in Software Engineering
Taming massive distributed datasets: data sampling using bitmap indices

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate query processing has emerged as a cost-effective approach for dealing with the huge data volumes and stringent response-time requirements of today's decision support systems (DSS). Most work in this area, however, has so far been limited in its query processing scope, typically focusing on specific forms of aggregate queries. Furthermore, conventional approaches based on sampling or histograms appear to be inherently limited when it comes to approximating the results of complex queries over high-dimensional DSS data sets. In this paper, we propose the use of multi-dimensional wavelets as an effective tool for general-purpose approximate query processing in modern, high-dimensional applications. Our approach is based on building wavelet-coefficient synopses of the data and using these synopses to provide approximate answers to queries. We develop novel query processing algorithms that operate directly on the wavelet-coefficient synopses of relational tables, allowing us to process arbitrarily complex queries entirely in the wavelet-coefficient domain. This guarantees extremely fast response times since our approximate query execution engine can do the bulk of its processing over compact sets of wavelet coefficients, essentially postponing the expansion into relational tuples until the end-result of the query. We also propose a novel wavelet decomposition algorithm that can build these synopses in an I/O-efficient manner. Finally, we conduct an extensive experimental study with synthetic as well as real-life data sets to determine the effectiveness of our wavelet-based approach compared to sampling and histograms. Our results demonstrate that our techniques: (1) provide approximate answers of better quality than either sampling or histograms; (2) offer query execution-time speedups of more than two orders of magnitude; and (3) guarantee extremely fast synopsis construction times that scale linearly with the size of the data.