Adaptive parallel aggregation algorithms

Authors:
Ambuj Shatdal;Jeffrey F. Naughton
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison
Venue:
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Year:
1995

Citing 7
Cited 31

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Probabilistic methods in query processing

Probabilistic methods in query processing
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Parallel Algorithms and Their Implementation in MICRONET

VLDB '82 Proceedings of the 8th International Conference on Very Large Data Bases

On parallel processing of aggregate and scalar functions in object-relational DBMS

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A dynamic load balancing strategy for parallel datacube computation

Proceedings of the 2nd ACM international workshop on Data warehousing and OLAP
Extending complex ad-hoc OLAP

Proceedings of the eighth international conference on Information and knowledge management
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient OLAP query processing in distributed data warehouses

Information Systems - Special issue: Best papers from EDBT 2002
Efficient OLAP Query Processing in Distributed Data Warehouses

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Groupwise Processing of Relational Queries

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
OLAP Query Evaluation in a Database Cluster: A Performance Study on Intra-Query Parallelism

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
Data warehousing

Handbook of massive data sets
Optimizing data aggregation for cluster-based internet services

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
TAG: a Tiny AGgregation service for ad-hoc sensor networks

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
The Cougar Project: a work-in-progress report

ACM SIGMOD Record
Optimizing Reduction Computations In a Distributed Environment

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
TAG: a Tiny AGgregation service for Ad-Hoc sensor networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
TinyDB: an acquisitional query processing system for sensor networks

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
Partitioned optimization of complex queries

Information Systems
Adaptive Index Utilization in Memory-Resident Structural Joins

IEEE Transactions on Knowledge and Data Engineering
Incremental maintenance for non-distributive aggregate functions

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
New concepts for parallel object-relational query processing

New concepts for parallel object-relational query processing
Skew-resistant parallel processing of feature-extracting scientific user-defined functions

Proceedings of the 1st ACM symposium on Cloud computing
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
Scalable aggregation on multicore processors

Proceedings of the Seventh International Workshop on Data Management on New Hardware
Order preserving event aggregation in TBONs

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Supporting SQL-3 aggregations on grid-based data repositories

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hierarchical aggregation in networked data management

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Adaptive MapReduce using situation-aware mappers

Proceedings of the 15th International Conference on Extending Database Technology
Self-adaptive approximate queries for large-scale information aggregation

International Journal of Web and Grid Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aggregation and duplicate removal are common in SQL queries. However, in the parallel query processing literature, aggregate processing has received surprisingly little attention; furthermore, for each of the traditional parallel aggregation algorithms, there is a range of grouping selectivities where the algorithm performs poorly. In this work, we propose new algorithms that dynamically adapt, at query evaluation time, in response to observed grouping selectivities. Performance analysis via analytical modeling and an implementation on a workstation-cluster shows that the proposed algorithms are able to perform well for all grouping selectivities. Finally, we study the effect of data skew and show that for certain data sets the proposed algorithms can even outperform the best of traditional approaches.