Improving performance of adaptive component-based dataflow middleware

Authors:
Timothy D. R. Hartley;Erik Saule;ímit V. Çatalyürek
Affiliations:
Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA and Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA;Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA;Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA and Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
Venue:
Parallel Computing
Year:
2012

Citing 33
Cited 0

CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Coign automatic distributed partitioning system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Cluster I/O with River: making the fast case common

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Scheduling Cilk multithreaded parallel programs on processors of different speeds

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Reconfigurable computing: a survey of systems and software

ACM Computing Surveys (CSUR)
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Distributed processing of very large datasets with DataCutter

Parallel Computing - Clusters and computational grids for scientific computing
Athapascan-1: On-Line Building Data Flow Graph in a Parallel Language

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
ACDS: Adapting Computational Data Streams for High Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Scheduling DAGs on asynchronous processors

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors

Proceedings of the 2007 international workshop on Parallel symbolic computation
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Data Flow Supercomputers

Computer
Merge: a programming model for heterogeneous multi-core systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Biomedical image analysis on a cooperative cluster of GPUs and multicores

Proceedings of the 22nd annual international conference on Supercomputing
Harmony: an execution model and runtime for heterogeneous many core systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Capsules: Expressing Composable Computations in a Parallel Programming Model

Languages and Compilers for Parallel Computing
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Computer-aided prognosis of neuroblastoma on whole-slide images: Classification of stromal development

Pattern Recognition
A component-based framework for the Cell Broadband Engine

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Run-time optimizations for replicated dataflows on heterogeneous environments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce in MPI for Large-scale graph algorithms

Parallel Computing
Decision trees and MPI collective algorithm selection problem

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Making the best use of modern computational resources for distributed applications requires expert knowledge of low-level programming tools, or a productive high-level and high-performance programming framework. Unfortunately, even state-of-the-art high-level frameworks still require the developer to conduct a tedious manual tuning step to find the work partitioning which gives the best application execution performance. Here, we present a novel framework, with which developers can easily create high-performance dataflow applications, without the tedious tuning process. We compare the performance of our approach to that of three distributed programming frameworks which differ significantly in their programming paradigm, their support for multi-core CPUs and accelerators, and their load-balancing approach. These three frameworks are DataCutter, a component-based dataflow framework, KAAPI, a framework using asynchronous function calls, and MR-MPI, a MapReduce implementation. By highly optimizing the implementations of three applications on the four frameworks and comparing the execution time performance of the runtime engines, we show their strengths and weaknesses. We show that our approach achieves good performance for a wide range of applications, with a much-reduced development cost.