Waiting algorithms for synchronization in large-scale multiprocessors

Authors:
Beng-Hong Lim;Anant Agarwal
Affiliations:
Massachusetts Institute of Technology, Cambridge;Massachusetts Institute of Technology, Cambridge
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
1993

Citing 22
Cited 17

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
MASA: a multithreaded processor architecture for parallel symbolic computing

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Characterizing the synchronization behavior of parallel programs

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Adaptive backoff synchronization techniques

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
A methodology for implementing highly concurrent data structures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Counting networks and multi-processor coordination

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual: version 8

The SPARC architecture manual: version 8
M-structures: extending a parallel, non-strict, functional language with state

Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Competitive randomized algorithms for non-uniform problems

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Probability and Statistics with Reliability, Queuing and Computer Science Applications

Probability and Statistics with Reliability, Queuing and Computer Science Applications
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
I-structures: Data structures for parallel computing

Proceedings of the Workshop on Graph Reduction
Dynamic decentralized cache schemes for mimd parallel processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
WAITING ALGORITHMS FOR SYNCHRONIZATION IN LARGE-SCALE MULTIPROCESSORS

WAITING ALGORITHMS FOR SYNCHRONIZATION IN LARGE-SCALE MULTIPROCESSORS
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR

THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR

Reactive synchronization algorithms for multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
Combining funnels: a new twist on an old tale…

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Scalable concurrent priority queue algorithms

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
A study of locking objects with bimodal fields

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Parallel Programming
Two-Phase Barrier: A Synchronization Primitive for Improving the Processor Utilization

International Journal of Parallel Programming
Characterizing the Performance of Algorithms for Lock-Free Objects

IEEE Transactions on Computers
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Barrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A scalable lock-free stack algorithm

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Performance of Switch Blocking on Multithreaded Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
A scalable lock-free stack algorithm

Journal of Parallel and Distributed Computing
Hardware-based synchronization support for shared accesses in multicore architectures

ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a thread first waits by polling a synchronization variable. If the cost of polling reaches a limit Lpoll and further waiting is necessary, the thread is blocked, incurring an additional fixed cost, B. The choice of Lpoll is a critical determinant of the performance of two-phase algorithms. We focus on methods for statically determining Lpoll because the run-time overhead of dynamically determining Lpoll can be comparable to the cost of blocking in large-scale multiprocessor systems with lightweight threads.Our experiments show that always-block (Lpoll = 0) is a good waiting algorithm with performance that is usually close to the best of the algorithms compared. We show that even better performance can be achieved with a static choice of Lpoll based on knowledge of likely wait-time distributions. Motivated by the observation that different synchronization types exhibit different wait-time distributions, we prove that a static choice of Lpoll can yield close to optimal on-line performance against an adversary that is restricted to choosing wait times from a fixed family of probability distributions. This result allows us to make an optimal static choice of Lpoll based on synchronization type. For exponentially distributed wait times, we prove that setting Lpoll = 1n(e-1)B results in a waiting cost that is no more than e/(e-1) times the cost of an optimal off-line algorithm. For uniformly distributed wait times, we prove that setting Lpoll=1/2(square root of 5 -1)B results in a waiting cost that is no more than (square root of 5 + 1)/2 (the golden ratio) times the cost of an optimal off-line algorithm. Experimental measurements of several parallel applications on the Alewife multiprocessor simulator corroborate our theoretical findings.