Data Reduction by Partial Preaggregation

Authors:
Affiliations:
Venue:
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Year:
2002

Citing 0
Cited 3

Spatiotemporal Aggregate Computation: A Survey

IEEE Transactions on Knowledge and Data Engineering
Estimating the output cardinality of partial preaggregation with a measure of clusteredness

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query-aware partitioning for monitoring massive network data streams

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partial preaggregation is a simple data reduction operator that can be applied to aggregation queries. Whenever we group and aggregate on a column set G, we can preaggregate on any column set that functionally determines G. Preaggregation can be used, for example, to reduce the input size to a join. Regular aggregation reduces the input to one record per group. Partial preaggregation exploit the fact that preaggregation need not be complete -if multiple record happen to be output for a group, they will be combined into the same group by the final aggregation. This paper describes a straightforward hash-based algorithm for partial preaggregation, discusses where it can be applied, and derives a mathematical model for estimating the output size. The effectiveness of the technique and the accuracy of the model are shown on both artificial and real data. It is also shown how to reduce memory requirement by combining partial preaggregation with the input phase of a subsequent join or sort operator. Partial preaggregation has been implemented, in part, in Microsoft SQL Server.