Empirical studies of competitve spinning for a shared-memory multiprocessor

Authors:
Anna R. Karlin;Kai Li;Mark S. Manasse;Susan Owicki
Affiliations:
DEC Systems Research Center, 130 Lytton Ave., Palo Alto, CA;Dept of Computer Science, Princeton University, Princeton, NJ;DEC Systems Research Center, 130 Lytton Ave., Palo Alto, CA;DEC Systems Research Center, 130 Lytton Ave., Palo Alto, CA
Venue:
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Year:
1991

Citing 12
Cited 56

Amortized efficiency of list update and paging rules

Communications of the ACM
A fast mutual exclusion algorithm

ACM Transactions on Computer Systems (TOCS)
Firefly: A Multiprocessor Workstation

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
The performance implications of thread management alternatives for shared-memory multiprocessors

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
A methodology for implementing highly concurrent data structures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
The effect of context switches on cache performance

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Competitive randomized algorithms for non-uniform problems

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Additional comments on a problem in concurrent programming control

Communications of the ACM
Solution of a problem in concurrent programming control

Communications of the ACM
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
WAITING ALGORITHMS FOR SYNCHRONIZATION IN LARGE-SCALE MULTIPROCESSORS

WAITING ALGORITHMS FOR SYNCHRONIZATION IN LARGE-SCALE MULTIPROCESSORS

Network locality at the scale of processes

SIGCOMM '91 Proceedings of the conference on Communications architecture & protocols
The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Scheduler activations: effective kernel support for the user-level management of parallelism

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Network locality at the scale of processes

ACM Transactions on Computer Systems (TOCS)
Operating system support for parallel programming on RP3

IBM Journal of Research and Development
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Waiting algorithms for synchronization in large-scale multiprocessors

ACM Transactions on Computer Systems (TOCS)
Using scheduler information to achieve optimal barrier synchronization performance

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization

PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Spin-block synchronization algorithm in the shared memory multiprocessor system

ACM SIGOPS Operating Systems Review
Reactive synchronization algorithms for multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A note on structured interrupts

ACM SIGOPS Operating Systems Review
High performance synchronization algorithms for multiprogrammed multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Effective distributed scheduling of parallel workloads

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Combining funnels: a new twist on an old tale…

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Scheduling with implicit information in distributed systems

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Scalable concurrent priority queue algorithms

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000

ICS '99 Proceedings of the 13th international conference on Supercomputing
A study of locking objects with bimodal fields

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Adaptive two-level thread management for fast MPI execution on shared memory machines

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Let caches decay: reducing leakage energy via exploitation of cache generational behavior

ACM Transactions on Computer Systems (TOCS)
Non-blocking timeout in scalable queue-based spin locks

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies

WOSP '02 Proceedings of the 3rd international workshop on Software and performance
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Parallel Programming
Color and Sound in Algorithm Animation

Computer
Reducing Waiting Costs in User-Level Communication

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Barrier Synchronization on a Loaded SMP Using Two-Phase Waiting Algorithms

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Adaptive Disk Spin-down Policies for Mobile Computers

MLICS '95 Proceedings of the 2nd Symposium on Mobile and Location-Independent Computing
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Two Adaptive Hybrid Cache Coherency Protocols

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Pipelined functional tree accesses and updates: scheduling, synchronization, caching and coherence

Journal of Functional Programming
Java server performance: a case study of building efficient, scalable Jvms

IBM Systems Journal
A softerware monitor for shared-memory multiprocessor computers

Software—Practice & Experience
Scalable synchronous queues

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Spin Detection Hardware for Improved Management of Multithreaded Systems

IEEE Transactions on Parallel and Distributed Systems
Lightweight lock-free synchronization methods for multithreading

Proceedings of the 20th annual international conference on Supercomputing
Self-tuning reactive diffracting trees

Journal of Parallel and Distributed Computing
Efficient self-tuning spin-locks using competitive analysis

Journal of Systems and Software
Towards scalable multiprocessor virtual machines

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Using continuations to build a user-level threads library

MSYM'93 Proceedings of the 3rd conference on USENIX MACH III Symposium - Volume 1
Adaptive modem connection lifetimes

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Scalable synchronous queues

Communications of the ACM - Security in the Browser
Fast switching of threads between cores

ACM SIGOPS Operating Systems Review
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Smartlocks: lock acquisition scheduling for self-aware synchronization

Proceedings of the 7th international conference on Autonomic computing
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Preemption adaptivity in time-published queue-based spin locks

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
VirtuOS: an operating system with kernel virtualization

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common operation in multiprocessor programs is acquiring a lock to protect access to shared data. Typically, the requesting thread is blocked if the lock it needs is held by another thread. The cost of blocking one thread and activating another can be a substantial part of program execution time. Alternatively, the thread could spin until the lock is free, or spin for a while and then block. This may avoid context-switch overhead, but processor cycles may be wasted in unproductive spinning. This paper studies seven strategies for determining whether and how long to spin before blocking. Of particular interest are competitive strategies, for which the performance can be shown to be no worse than some constant factor times an optimal off-line strategy. The performance of five competitive strategies is compared with that of always blocking, always spinning, or using the optimal off-line algorithm. Measurements of lock-waiting time distributions for five parallel programs were used to compare the cost of synchronization under all the strategies. Additional measurements of elapsed time for some of the programs and strategies allowed assessment of the impact of synchronization strategy on overall program performance. Both types of measurements indicate that the standard blocking strategy performs poorly compared to mixed strategies. Among the mixed strategies studied, adaptive algorithms perform better than non-adaptive ones.