Adaptive backoff synchronization techniques

Authors:
A. Agarwal;M. Cherian
Affiliations:
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Year:
1989

Citing 6
Cited 40

Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
An evaluation of directory schemes for cache coherence

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Analysis of cache invalidation patterns in multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Using feedback to control tree saturation in multistage interconnection networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Ethernet: distributed packet switching for local computer networks

Communications of the ACM
PAX Computer; High-Speed Parallel Processing and Scientific Computing

PAX Computer; High-Speed Parallel Processing and Scientific Computing

The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Snoopy cache test-and-test-and-set without execessive bus contention

ACM SIGARCH Computer Architecture News
Blocking: exploiting spatial locality for trace compaction

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Counting networks and multi-processor coordination

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Scalable reader-writer synchronization for shared-memory multiprocessors

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Low contention load balancing on large-scale multiprocessors

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Waiting algorithms for synchronization in large-scale multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast, scalable synchronization with minimal hardware support

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Hot spot analysis in large scale shared memory multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Diffracting trees (preliminary version)

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Counting networks

Journal of the ACM (JACM)
A combinatorial treatment of balancing networks

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Scalable concurrent counting

ACM Transactions on Computer Systems (TOCS)
Efficient techniques for fast nested barrier synchronization

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A combinatorial treatment of balancing networks

Journal of the ACM (JACM)
Diffracting trees

ACM Transactions on Computer Systems (TOCS)
A steady state analysis of diffracting trees (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Reactive diffracting trees

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Combining funnels: a new twist on an old tale…

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Characterizing the Performance of Algorithms for Lock-Free Objects

IEEE Transactions on Computers
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A scalable lock-free stack algorithm

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Linearizable counting networks

Distributed Computing
A linear-time algorithm for optimal barrier placement

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using elimination to implement scalable and lock-free FIFO queues

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Adversarial contention resolution for simple channels

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Self-tuning reactive diffracting trees

Journal of Parallel and Distributed Computing
Efficient self-tuning spin-locks using competitive analysis

Journal of Systems and Software
A scalable lock-free stack algorithm

Journal of Parallel and Distributed Computing
Decoupling contention management from scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Data structures in the multicore age

Communications of the ACM
A hierarchical CLH queue lock

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Obstruction-Free algorithms can be practically wait-free

DISC'05 Proceedings of the 19th international conference on Distributed Computing
On the nature of progress

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Scalable statistics counters

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.02

Visualization

Abstract

Shared-memory multiprocessors commonly use shared variables for synchronization. Our simulations of real parallel applications show that large-scale cache-coherent multiprocessors suffer significant amounts of invalidation traffic due to synchronization. Large multiprocessors that do not cache synchronization variables are often more severely impacted. If this synchronization traffic is not reduced or managed adequately, synchronization references can cause severe congestion in the network. We propose a class of adaptive back-off methods that do not use any extra hardware and can significantly reduce the memory traffic to synchronization variables. These methods use synchronization state to reduce polling of synchronization variables. Our simulations show that when the number of processors participating in a barrier synchronization is small compared to the time of arrival of the processors, reductions of 20 percent to over 95 percent in synchronization traffic can be achieved at no extra cost. In other situations adaptive backoff techniques result in a tradeoff between reduced network accesses and increased processor idle time.