CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Coign automatic distributed partitioning system
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Cluster I/O with River: making the fast case common
Proceedings of the sixth workshop on I/O in parallel and distributed systems
Scheduling Cilk multithreaded parallel programs on processors of different speeds
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Distributed processing of very large datasets with DataCutter
Parallel Computing - Clusters and computational grids for scientific computing
Athapascan-1: On-Line Building Data Flow Graph in a Parallel Language
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
ACDS: Adapting Computational Data Streams for High Performance
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Scheduling DAGs on asynchronous processors
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors
Proceedings of the 2007 international workshop on Parallel symbolic computation
Parallel Programmability and the Chapel Language
International Journal of High Performance Computing Applications
Computer
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Biomedical image analysis on a cooperative cluster of GPUs and multicores
Proceedings of the 22nd annual international conference on Supercomputing
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Capsules: Expressing Composable Computations in a Parallel Programming Model
Languages and Compilers for Parallel Computing
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A component-based framework for the Cell Broadband Engine
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Run-time optimizations for replicated dataflows on heterogeneous environments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce in MPI for Large-scale graph algorithms
Parallel Computing
Decision trees and MPI collective algorithm selection problem
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Making the best use of modern computational resources for distributed applications requires expert knowledge of low-level programming tools, or a productive high-level and high-performance programming framework. Unfortunately, even state-of-the-art high-level frameworks still require the developer to conduct a tedious manual tuning step to find the work partitioning which gives the best application execution performance. Here, we present a novel framework, with which developers can easily create high-performance dataflow applications, without the tedious tuning process. We compare the performance of our approach to that of three distributed programming frameworks which differ significantly in their programming paradigm, their support for multi-core CPUs and accelerators, and their load-balancing approach. These three frameworks are DataCutter, a component-based dataflow framework, KAAPI, a framework using asynchronous function calls, and MR-MPI, a MapReduce implementation. By highly optimizing the implementations of three applications on the four frameworks and comparing the execution time performance of the runtime engines, we show their strengths and weaknesses. We show that our approach achieves good performance for a wide range of applications, with a much-reduced development cost.