Tagged procedure calls (TPC): efficient runtime support for task-based parallelism on the cell processor

  • Authors:
  • George Tzenakis;Konstantinos Kapelonis;Michail Alvanos;Konstantinos Koukos;Dimitrios S. Nikolopoulos;Angelos Bilas

  • Affiliations:
  • Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece;Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece

  • Venue:
  • HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Increasing the number of cores in modern CPUs is the main trend for improving system performance. A central challenge is the runtime support that multi-core systems ought to use for sustaining high performance and scalability without increasing disproportionally the effort required by the programmer. In this work we present Tagged Procedure Calls (TPC), a runtime system for supporting task-based programming models on architectures that require explicit data access specification by the programmer. We present the design and implementation of TPC for the Cell processor and examine how the runtime system can support task management functions with on-chip communication only. Through minimizing off-chip transactions in the runtime, we achieve sub-microsecond task initiation latency and minimum null task initiation/completion latency of 385 ns. We evaluate TPC with several kernels and applications, demonstrating that TPC achieves scalable on-chip execution of codes previously parallelized and optimized for shared-memory multiprocessors, can exploit additional fine-grain parallelism in codes previously parallelized at coarse levels of granularity, and performs competitively to existing task-based parallel programming frameworks that statically optimize data layout and task placement.