Hardware and software tradeoffs for task synchronization on manycore architectures

Authors:
Yonghong Yan;Sanjay Chatterjee;Daniel A. Orozco;Elkin Garcia;Zoran Budimlić;Jun Shirako;Robert S. Pavel;Guang R. Gao;Vivek Sarkar
Affiliations:
Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Electrical Engineering, University of Delaware;Department of Electrical Engineering, University of Delaware;Department of Computer Science, Rice University;Department of Computer Science, Rice University;Department of Electrical Engineering, University of Delaware;Department of Electrical Engineering, University of Delaware;Department of Computer Science, Rice University
Venue:
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Year:
2011

Citing 7
Cited 0

Synchronization using counting semaphores

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

Proceedings of the 22nd annual international conference on Supercomputing
Work-first and help-first scheduling policies for async-finish task parallelism

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Manycore architectures - hundreds to thousands of cores per processor - are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for the scheduling and coordination of concurrent tasks. A critical prerequisite for an efficient runtime is a scalable synchronization mechanism to support task coordination at different levels of granularity. This paper describes the implementation of a high-level synchronization construct called phasers on the IBM Cyclops64 manycore processor, and compares phasers to lower-level synchronization primitives currently available to Cyclops64 programmers. Phasers support synchronization of dynamic tasks by allowing tasks to register and deregister with a phaser object. It provides a general unification of point-to-point and collective synchronizations with easy-to-use interfaces, thereby offering productivity advantages over hardware primitives when used on manycores. We have experimented with several approaches to phaser implementation using software, hardware and a combination of both to explore their portability and performance. The results show that a highly-optimized phaser implementation delivered comparable performance to that obtained with lower-level synchronization primitives. We also demonstrate the success of the hardware optimizations proposed for phasers.