Approximate computation of multidimensional aggregates of sparse data using wavelets

Authors:
Jeffrey Scott Vitter;Min Wang
Affiliations:
Center for Geometric Computing and Department of Computer Science, Duke University, Durham, NC;Center for Geometric Computing and Department of Computer Science, Duke University, Durham, NC
Venue:
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Year:
1999

Citing 19
Cited 160

The input/output complexity of sorting and related problems

Communications of the ACM
An overview of wavelet based multiresolution analyses

SIAM Review
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Range queries in OLAP data cubes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
External memory algorithms and data structures

External memory algorithms
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Recovering Information from Summary Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Supporting I/O-efficient scientific computation in TPIE

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing

Spatial join selectivity using power laws

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes

Proceedings of the ninth international conference on Information and knowledge management
Optimal and approximate computation of summary statistics for range aggregates

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Progressive approximate aggregate queries with a multi-resolution tree structure

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Loglinear-Based Quasi Cubes

Journal of Intelligent Information Systems
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient aggregation over objects with extent

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
How to evaluate multiple range-sum queries progressively

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Statistical synopses for graph-structured XML databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Wavelet synopses with error guarantees

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Approximate Query Answering Using Data Warehouse Striping

Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
Continuous queries over data streams

ACM SIGMOD Record
Compressed data cube for approximate OLAP query processing

Journal of Computer Science and Technology
Automatic tuning of data synopses

Information Systems - Special issue: Best papers from EDBT 2002
Approximated trial and error analysis in scientific databases

Information Systems - Special issue: Best papers from EDBT 2002
Optimizing Scientific Databases for Client Side Data Processing

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Hierarchical Prefix Cubes for Range-Sum Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Using Loglinear Models to Compress Datacube

WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate Query Processing: Taming the TeraBytes

Proceedings of the 27th International Conference on Very Large Data Bases
Elimination of Redundant Views in Multidimensional Aggregates

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Supporting Online Queries in ROLAP

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Approximate Query Answering Using Data Warehouse Striping

DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Wavelet-Based Cost Estimation for Spatial Queries

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Flexible Data Cubes for Online Aggregation

ICDT '01 Proceedings of the 8th International Conference on Database Theory
NetCube: A Scalable Tool for Fast Data Mining and Compression

Proceedings of the 27th International Conference on Very Large Data Bases
Approximate query processing using wavelets

The VLDB Journal — The International Journal on Very Large Data Bases
External memory algorithms

Handbook of massive data sets
Managing and analyzing massive data sets with data cubes

Handbook of massive data sets
Wavelet-based relative prefix sum methods for range sum queries in data cubes

CASCON '02 Proceedings of the 2002 conference of the Centre for Advanced Studies on Collaborative research
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Transmitting Datacubes over Congested Networks

ITCC '00 Proceedings of the The International Conference on Information Technology: Coding and Computing (ITCC'00)
Extended wavelets for multiple measures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
Adaptive and Incremental Processing for Distance Join Queries

IEEE Transactions on Knowledge and Data Engineering
DSQoS-distributed architecture providing QoS in summary warehouses

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
A new histogram method for sparse attributes: the averaged rectangular attribute cardinality map

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Probabilistic wavelet synopses

ACM Transactions on Database Systems (TODS)
Selectivity Estimation for XML Twigs

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Approximate Selection Queries over Imprecise Data

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Space-efficient cubes for OLAP range-sum queries

Decision Support Systems
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Compressing historical information in sensor networks

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Range Aggregate Processing in Spatial Databases

IEEE Transactions on Knowledge and Data Engineering
Fast range query estimation by N-level tree histograms

Data & Knowledge Engineering
A compression method for prefix-sum cubes

Information Processing Letters
Deterministic wavelet thresholding for maximum-error metrics

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Synopses for query optimization: a space-complexity perspective

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases
XML stream processing using tree-edit distance embeddings

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Hierarchical binary histograms for summarizing multi-dimensional data

Proceedings of the 2005 ACM symposium on Applied computing
One-pass wavelet synopses for maximum-error metrics

VLDB '05 Proceedings of the 31st international conference on Very large data bases
MDL summarization with holes

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP

Proceedings of the 8th ACM international workshop on Data warehousing and OLAP
Integrating DCT and DWT for approximating cube streams

Proceedings of the 14th ACM international conference on Information and knowledge management
Approximation algorithms for wavelet transform coding of data streams

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Wavelet synopses for general error metrics

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Synopses for query optimization: A space-complexity perspective

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
Improving range-sum query evaluation on data cubes via polynomial approximation

Data & Knowledge Engineering
An accuracy-aware compression technique for multidimensional data cubes

Proceedings of the 2006 ACM symposium on Applied computing
Online summarization of dynamic time series data

The VLDB Journal — The International Journal on Very Large Data Bases
XSKETCH synopses for XML data graphs

ACM Transactions on Database Systems (TODS)
A study on workload-aware wavelet synopses for point and range-sum queries

DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Pre-aggregation with probability distributions

DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Estimating query result sizes for proxy caching in scientific database federations

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Supporting mobile decision making with association rules and multi-layered caching

Decision Support Systems
Optimal workload-based weighted wavelet synopses

Theoretical Computer Science
An adaptive and dynamic dimensionality reduction method for high-dimensional indexing

The VLDB Journal — The International Journal on Very Large Data Bases
Approximate range---sum query answering on data cubes with probabilistic guarantees

Journal of Intelligent Information Systems
Optimized stratified sampling for approximate query processing

ACM Transactions on Database Systems (TODS)
Extended wavelets for multiple measures

ACM Transactions on Database Systems (TODS)
Inner-product based wavelet synopses for range-sum queries

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Exploiting duality in summarization with deterministic guarantees

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Dissemination of compressed historical information in sensor networks

The VLDB Journal — The International Journal on Very Large Data Bases
Structure and value synopses for XML data graphs

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient exploration of large scientific databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ROLAP implementations of the data cube

ACM Computing Surveys (CSUR)
Approximate Query Processing in Cube Streams

IEEE Transactions on Knowledge and Data Engineering
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error

IEEE Transactions on Knowledge and Data Engineering
MRST: an efficient monitoring technology of summarization on stream data

Journal of Computer Science and Technology
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SASH: a self-adaptive histogram set for dynamically changing workloads

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Robust estimation with sampling and approximate pre-aggregation

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
XWAVE: optimal and approximate extended wavelets

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
REHIST: relative error histogram construction algorithms

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A probabilistic model for data cube compression and query approximation

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Detecting attribute dependencies from query feedback

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation

Journal of Systems and Software
Histograms based on the minimum description length principle

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient information compression in sensor networks

International Journal of Sensor Networks
Analytic-based estimation of query result sizes

AIKED'05 Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases
DAWN: an efficient framework of DCT for data with error estimation

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical synopses with optimal error guarantees

ACM Transactions on Database Systems (TODS)
Maintaining very large random samples using the geometric file

The VLDB Journal — The International Journal on Very Large Data Bases
Compressed hierarchical binary histograms for summarizing multi-dimensional data

Knowledge and Information Systems
Algorithms and data structures for external memory

Foundations and Trends® in Theoretical Computer Science
Scalable approximate query processing with the DBO engine

ACM Transactions on Database Systems (TODS)
Exploiting Spatio-temporal Correlations for Data Processing in Sensor Networks

GeoSensor Networks
Plot Query Processing with Wavelets

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Approximate lineage for probabilistic databases

Proceedings of the VLDB Endowment
Assisting decision making in the event-driven enterprise using wavelets

Decision Support Systems
Measuring interestingness of discovered skewed patterns in data cubes

Decision Support Systems
Multi-query optimization for sketch-based estimation

Information Systems
A new approach to building histogram for selectivity estimation in query processing optimization

Computers & Mathematics with Applications
Unrestricted wavelet synopses under maximum error bound

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Multiplicative synopses for relative-error metrics

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hierarchically compressed wavelet synopses

The VLDB Journal — The International Journal on Very Large Data Bases
A Multiple Correspondence Analysis to Organize Data Cubes

Proceedings of the 2007 conference on Databases and Information Systems IV: Selected Papers from the Seventh International Baltic Conference DB&IS'2006
Fast and effective histogram construction

Proceedings of the 18th ACM conference on Information and knowledge management
Promotion analysis in multi-dimensional space

Proceedings of the VLDB Endowment
A wavelet transform for efficient consolidation of sensor relations with quality guarantees

Proceedings of the VLDB Endowment
Optimality and scalability in lattice histogram construction

Proceedings of the VLDB Endowment
Spatial Selectivity Estimation Using Cumulative Density Wavelet Histogram

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Transparent anonymization: Thwarting adversaries who know the algorithm

ACM Transactions on Database Systems (TODS)
Revisiting the cube lifecycle in the presence of hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
Transformation of continuous aggregation join queries over data streams

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
On wavelet decomposition of uncertain time series data sets

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Spatiotemporal summarization of traffic data streams

Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming
Effective processing of continuous group-by aggregate queries in sensor networks

Journal of Systems and Software
Beyond simple aggregates: indexing for summary queries

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
iReduct: differential privacy with reduced relative errors

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Context-sensitive ranking for document retrieval

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A quad-tree based multiresolution approach for two-dimensional summary data

Information Systems
Approximate query on historical stream data

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
Fast approximate wavelet tracking on streams

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
On futuristic query processing in data streams

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
An effective algorithm to extract dense sub-cubes from a large sparse cube

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Flexible query answering in data cubes

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Optimal workload-based weighted wavelet synopses

ICDT'05 Proceedings of the 10th international conference on Database Theory
Tight bounds on the estimation distance using wavelet

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
ADenTS: an adaptive density-based tree structure for approximating aggregate queries over real attributes

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An architecture of a wavelet based approach for the approximate querying of huge sets of data in the telecommunication environment

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Estimating aggregate join queries over data streams using discrete cosine transform

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Lossless reduction of datacubes

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Attribute value reordering for efficient hybrid OLAP

Information Sciences: an International Journal
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Collaborative image compression with error bounds in wireless sensor networks for crop monitoring

Computers and Electronics in Agriculture
Towards intensional answers to OLAP queries for analytical sessions

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
An OLAM-based framework for complex knowledge pattern discovery in distributed-and-heterogeneous-data-sources and cooperative information systems

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
BlinkDB: queries with bounded errors and bounded response times on very large data

Proceedings of the 8th ACM European Conference on Computer Systems
Finding the minimum number of elements with sum above a threshold

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries.In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a smalll number of I/Os, depending upon the desired accuracy.We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.