MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Characterizing the synchronization behavior of parallel programs
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Adaptive backoff synchronization techniques
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
A methodology for implementing highly concurrent data structures
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Counting networks and multi-processor coordination
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
M-structures: extending a parallel, non-strict, functional language with state
Proceedings of the 5th ACM conference on Functional programming languages and computer architecture
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Competitive randomized algorithms for non-uniform problems
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
I-structures: Data structures for parallel computing
Proceedings of the Workshop on Graph Reduction
Dynamic decentralized cache schemes for mimd parallel processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
WAITING ALGORITHMS FOR SYNCHRONIZATION IN LARGE-SCALE MULTIPROCESSORS
WAITING ALGORITHMS FOR SYNCHRONIZATION IN LARGE-SCALE MULTIPROCESSORS
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
Combining funnels: a new twist on an old tale…
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Scalable concurrent priority queue algorithms
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
ICS '99 Proceedings of the 13th international conference on Supercomputing
A study of locking objects with bimodal fields
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
International Journal of Parallel Programming
Two-Phase Barrier: A Synchronization Primitive for Improving the Processor Utilization
International Journal of Parallel Programming
Characterizing the Performance of Algorithms for Lock-Free Objects
IEEE Transactions on Computers
Performance Analysis of Four Memory Consistency Models for Multithreaded Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Barrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A scalable lock-free stack algorithm
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Performance of Switch Blocking on Multithreaded Architectures
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
A scalable lock-free stack algorithm
Journal of Parallel and Distributed Computing
Hardware-based synchronization support for shared accesses in multicore architectures
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Hi-index | 0.00 |
Through analysis and experiments, this paper investigates two-phase waiting algorithms to minimize the cost of waiting for synchronization in large-scale multiprocessors. In a two-phase algorithm, a thread first waits by polling a synchronization variable. If the cost of polling reaches a limit Lpoll and further waiting is necessary, the thread is blocked, incurring an additional fixed cost, B. The choice of Lpoll is a critical determinant of the performance of two-phase algorithms. We focus on methods for statically determining Lpoll because the run-time overhead of dynamically determining Lpoll can be comparable to the cost of blocking in large-scale multiprocessor systems with lightweight threads.Our experiments show that always-block (Lpoll = 0) is a good waiting algorithm with performance that is usually close to the best of the algorithms compared. We show that even better performance can be achieved with a static choice of Lpoll based on knowledge of likely wait-time distributions. Motivated by the observation that different synchronization types exhibit different wait-time distributions, we prove that a static choice of Lpoll can yield close to optimal on-line performance against an adversary that is restricted to choosing wait times from a fixed family of probability distributions. This result allows us to make an optimal static choice of Lpoll based on synchronization type. For exponentially distributed wait times, we prove that setting Lpoll = 1n(e-1)B results in a waiting cost that is no more than e/(e-1) times the cost of an optimal off-line algorithm. For uniformly distributed wait times, we prove that setting Lpoll=1/2(square root of 5 -1)B results in a waiting cost that is no more than (square root of 5 + 1)/2 (the golden ratio) times the cost of an optimal off-line algorithm. Experimental measurements of several parallel applications on the Alewife multiprocessor simulator corroborate our theoretical findings.