Evaluating holistic aggregators efficiently for very large datasets

Authors:
Lixin Fu;Sanguthevar Rajasekaran
Affiliations:
Division of Computer Science, Department of Mathematical Sciences, University of North Carolina at Greensboro, Bryan 383, NC 27402-6170, Greensboro, USA;CSE, University of Connecticut, 191 Auditorium Road, U-155, CT 06269-3155, Storrs, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2004

Citing 15
Cited 0

The P2 algorithm for dynamic calculation of quantiles and histograms without storing observations

Communications of the ACM
A logarithmic time sort for linear size networks

Journal of the ACM (JACM)
Parallel algorithms: design and analysis

Parallel algorithms: design and analysis
Randomized algorithms

Randomized algorithms
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Range queries in OLAP data cubes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Partial-sum queries in OLAP data cubes using covering codes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Computing the median with uncertainty

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes

Proceedings of the ninth international conference on Information and knowledge management
Selection algorithms for parallel disk systems

Journal of Parallel and Distributed Computing
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data warehousing applications, numerous OLAP queries involve the processing of holistic aggregators such as computing the “top n,” median, quantiles, etc. In this paper, we present a novel approach called dynamic bucketing to efficiently evaluate these aggregators. We partition data into equiwidth buckets and further partition dense buckets into subbuckets as needed by allocating and reclaiming memory space. The bucketing process dynamically adapts to the input order and distribution of input datasets. The histograms of the buckets and subbuckets are stored in our new data structure called structure trees. A recent selection algorithm based on regular sampling is generalized and its analysis extended. We have also compared our new algorithms with this generalized algorithm and several other recent algorithms. Experimental results show that our new algorithms significantly outperform prior ones not only in the runtime but also in accuracy.