Optimizing Reduction Computations In a Distributed Environment

Authors:
Tahsin Kurc;Feng Lee;Gagan Agrawal;Umit Catalyurek;Renato Ferreira;Joel Saltz
Affiliations:
Ohio State University, Columbus;Ohio State University, Columbus;Ohio State University, Columbus;Ohio State University, Columbus;Ohio State University, Columbus;Ohio State University, Columbus
Venue:
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Year:
2003

Citing 19
Cited 3

Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Adaptive parallel aggregation algorithms

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
Querying very large multi-dimensional datasets in ADR

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
Wide-Area Computing: Resource Sharing on a Large Scale

Computer
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
High-performance remote access to climate simulation data: a challenge problem for data grid technologies

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Visualization of Large Data Sets with the Active Data Repository

IEEE Computer Graphics and Applications
A National-Scale Authentication Infrastructure

Computer
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Armada: A Parallel File System for Computational Grids

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Optimizing Retrieval and Processing of Multi-Dimensional Scientific Datasets

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
The virtual microscope

IEEE Transactions on Information Technology in Biomedicine

Run-time optimizations for replicated dataflows on heterogeneous environments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Supporting SQL-3 aggregations on grid-based data repositories

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Optimizing dataflow applications on heterogeneous environments

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate runtime strategies for data-intensive applications that invovle generalized reductions on large, distributed datasets.Our set of strategies includes replicated filter state, partitioned filter state, and hybrid options between these two extremes.We evaluate these strategies using emulators of three real applications, different query and output sizes, and a number of configurations.We consider execution in a homogeneous cluster and in a distributed environment where only a subset of nodes hst the data.Our results show replicating the filter state scales well and outperforms other schemes, if sufficient memory is available and sufficient computation is involved to offset the cost of global merge step.In other cases, hybrid is usually the best.Moreover, in almost all cases, the performance of the hybrid strategy is quite close to the best strategy. Thus, we believe that hybrid is an attractive approach when the relative performance of different schemes cannot be predicted.