Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems

Authors:
Filip Blagojevic;Dimitrios S. Nikolopoulos;Alexandros Stamatakis;Christos D. Antonopoulos;Matthew Curtis-Maury
Affiliations:
Department of Computer Science and Center for High-End Computing Systems, Virginia Tech, 2202 Kraft Drive, Blacksburg, VA 24061, USA;Department of Computer Science and Center for High-End Computing Systems, Virginia Tech, 2202 Kraft Drive, Blacksburg, VA 24061, USA;School of Computer and Communication Sciences, ícole Polytechnique Fédérale de Lausanne, Station 14, CH-1015 Lausanne, Switzerland;Department of Computer and Communication Engineering, University of Thessaly, 382 21 Volos, Greece;Department of Computer Science and Center for High-End Computing Systems, Virginia Tech, 2202 Kraft Drive, Blacksburg, VA 24061, USA
Venue:
Parallel Computing
Year:
2007

Citing 16
Cited 4

Optimal use of mixed task and data parallelism for pipelined computations

Journal of Parallel and Distributed Computing
Parallel algorithms for Bayesian phylogenetic inference

Journal of Parallel and Distributed Computing - High-performance computational biology
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A case study in top-down performance estimation for a large-scale parallel application

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Bioinformatics
PBPI: a high performance implementation of Bayesian phylogenetic inference

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems)

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Dependence-based code generation for a CELL processor

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Parallelism orchestration using DoPE: the degree of parallelism executive

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Adaptive parallel approximate similarity search for responsive multimedia retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Parcae: a system for flexible parallel execution

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Adaptive parallelism for web search

Proceedings of the 8th ACM European Conference on Computer Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate conventional cores that run legacy codes with specialized cores that serve as computational accelerators. The term multi-grain parallelism refers to the exposure of multiple dimensions of parallelism from within the runtime system, so as to best exploit a parallel architecture with heterogeneous computational capabilities between its cores and execution units. We investigate user-level schedulers that dynamically ''rightsize'' the dimensions and degrees of parallelism on the cell broadband engine. The schedulers address the problem of mapping application-specific concurrency to an architecture with multiple hardware layers of parallelism, without requiring programmer intervention or sophisticated compiler support. We evaluate recently introduced schedulers for event-driven execution and utilization-driven dynamic multi-grain parallelization on Cell. We also present a new scheduling scheme for dynamic multi-grain parallelism, S-MGPS, which uses sampling of dominant execution phases to converge to the optimal scheduling algorithm. We evaluate S-MGPS on an IBM Cell BladeCenter with two realistic bioinformatics applications that infer large phylogenies. S-MGPS performs within 2-10% of the optimal scheduling algorithm in these applications, while exhibiting low overhead and little sensitivity to application-dependent parameters.