Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Proceedings of the 34th annual international symposium on Computer architecture
HPP controller: a system controller for high performance computing
Frontiers of Computer Science in China
Landing stencil code on Godson-T
Journal of Computer Science and Technology
Hi-index | 0.00 |
As multiprocessors scale beyond the limits of a few tens of processors, they must look beyond traditional methods of synchronization to minimize serialization and achieve the high degrees of parallelism required to utilize large machines. By allowing synchronization at the level of the smallest unit of memory, fine-grain synchronization achieves these goals. Unfortunately, supporting efficient fine-grain synchronization without inordinate amounts of hardware has remained a challenge. This paper describes the support for fine-grain synchronization provided by the Alewife system. The premise underlying Alewife''s implementation is that successful synchronization attempts are the common case when serialization is minimized through word-level synchronization. Efficiency at low hardware cost is achieved by providing hardware support to streamline successful synchronization attempts and relegating other operations to software. Alewife provides a large synchronization name space by associating full-empty bits with each memory word. Successful synchronization attempts execute at normal load-store speeds, while attempts that fail invoke appropriate software trap handlers through a fast trap mechanism. The software handlers deal with the issues of retrying versus blocking, queueing and rescheduling. The efficiency of Alewife''s mechanisms is analyzed by comparing the costs of various synchronization operations and parallel application execution time.