Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems

  • Authors:
  • Filip Blagojevic;Dimitrios S. Nikolopoulos;Alexandros Stamatakis;Christos D. Antonopoulos;Matthew Curtis-Maury

  • Affiliations:
  • Department of Computer Science and Center for High-End Computing Systems, Virginia Tech, 2202 Kraft Drive, Blacksburg, VA 24061, USA;Department of Computer Science and Center for High-End Computing Systems, Virginia Tech, 2202 Kraft Drive, Blacksburg, VA 24061, USA;School of Computer and Communication Sciences, ícole Polytechnique Fédérale de Lausanne, Station 14, CH-1015 Lausanne, Switzerland;Department of Computer and Communication Engineering, University of Thessaly, 382 21 Volos, Greece;Department of Computer Science and Center for High-End Computing Systems, Virginia Tech, 2202 Kraft Drive, Blacksburg, VA 24061, USA

  • Venue:
  • Parallel Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.02

Visualization

Abstract

We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate conventional cores that run legacy codes with specialized cores that serve as computational accelerators. The term multi-grain parallelism refers to the exposure of multiple dimensions of parallelism from within the runtime system, so as to best exploit a parallel architecture with heterogeneous computational capabilities between its cores and execution units. We investigate user-level schedulers that dynamically ''rightsize'' the dimensions and degrees of parallelism on the cell broadband engine. The schedulers address the problem of mapping application-specific concurrency to an architecture with multiple hardware layers of parallelism, without requiring programmer intervention or sophisticated compiler support. We evaluate recently introduced schedulers for event-driven execution and utilization-driven dynamic multi-grain parallelization on Cell. We also present a new scheduling scheme for dynamic multi-grain parallelism, S-MGPS, which uses sampling of dominant execution phases to converge to the optimal scheduling algorithm. We evaluate S-MGPS on an IBM Cell BladeCenter with two realistic bioinformatics applications that infer large phylogenies. S-MGPS performs within 2-10% of the optimal scheduling algorithm in these applications, while exhibiting low overhead and little sensitivity to application-dependent parameters.