Approximate range---sum query answering on data cubes with probabilistic guarantees

Authors:
Alfredo Cuzzocrea;Wei Wang
Affiliations:
Department of Electronics, Computer Science, and Systems, University of Calabria, Cosenza, Italy 87036;School of Computer Science and Engineering, University of New South Wales & National ICT Australia, Sydney, Australia 2052
Venue:
Journal of Intelligent Information Systems
Year:
2007

Citing 41
Cited 15

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Towards an analysis of range query performance in spatial data structures

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
OLAP, relational, and multidimensional database systems

ACM SIGMOD Record
Range queries in OLAP data cubes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Dynamic assembly of views in data cubes

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Caching multidimensional queries using chunks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
AutoAdmin “what-if” index analysis utility

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
Wavelets for computer graphics: theory and applications

Wavelets for computer graphics: theory and applications
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Selectivity estimation in spatial databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
WALRUS: a similarity retrieval algorithm for image databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On approximating rectangle tiling and packing

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Loglinear-Based Quasi Cubes

Journal of Intelligent Information Systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Aqua: A Fast Decision Support Systems Using Approximate Query Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Dynamic Maintenance of Wavelet-Based Histograms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Fast Approximate Answers to Aggregate Queries on a Data Cube

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Mining Deviants in a Time Series Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
SISYPHUS: the implementation of a chunk-based storage manager for OLAP data cubes

Data & Knowledge Engineering - Special issue: Advances in OLAP
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Hierarchical binary histograms for summarizing multi-dimensional data

Proceedings of the 2005 ACM symposium on Applied computing
Overcoming Limitations of Approximate Query Answering in OLAP

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
A quad-tree based multiresolution approach for two-dimensional summary data

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management

Multiple-Objective Compression of Data Cubes in Cooperative OLAP Environments

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
H-IQTS: a semantics-aware histogram for compressing categorical OLAP data

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
LCS-Hist: taming massive high-dimensional data cube compression

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Enabling OLAP in mobile environments via intelligent data cube compression techniques

Journal of Intelligent Information Systems
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
Top-down compression of data cubes in the presence of simultaneous multiple hierarchical range queries

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Efficiently computing and querying multidimensional OLAP data cubes over probabilistic relational data

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Efficient online aggregates in dense-region-based data cube representations

Transactions on large-scale data- and knowledge-centered systems II
Efficient online aggregates in dense-region-based data cube representations

Transactions on large-scale data- and knowledge-centered systems II
Privacy Preserving OLAP over Distributed XML Data: A Theoretically-Sound Secure-Multiparty-Computation Approach

Journal of Computer and System Sciences
Towards intensional answers to OLAP queries for analytical sessions

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
An OLAM-based framework for complex knowledge pattern discovery in distributed-and-heterogeneous-data-sources and cooperative information systems

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
AUDIO: an integrity auditing framework of outlier-mining-as-a-service systems

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Uncertain OLAP over multidimensional data streams: state-of-the-art analysis and research perspectives

FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
An Integrated Query Relaxation Approach Adopting Data Abstraction and Fuzzy Relation

Journal of Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems (DSS), as they are widely used in many data analysis tasks. Traditionally, sampling-based techniques have been proposed to tackle this problem. However, their effectiveness degrade when the underlying data distribution is skewed. Another approach based on the outlier management can limit the effect of data skews but fails to address other requirements of approximate range aggregate queries, such as error guarantees and query processing efficiency. In this paper, we present a technique that provides approximate answers to range aggregate queries on OLAP data cubes efficiently, with theoretical guarantees on the errors. Our basic idea is to build different data structures to manage outliers and the rest of the data. Carefully chosen outliers are organized in a quad-tree based indexing data structure to provide efficient access for query processing. A query-workload adaptive, tree-like synopsis data structure, called T unable P artition-Tree (TP-Tree), is proposed to organize samples extracted from non-outlier data. Our experiments clearly demonstrate the merits of our technique, by comparing with previous well-known techniques.