The performance implications of thread management alternatives for shared-memory multiprocessors

Authors:
T. E. Anderson;D. D. Lazowska;H. M. Levy
Affiliations:
Department of Computer Science, University of Washington, Seattle WA;Department of Computer Science, University of Washington, Seattle WA;Department of Computer Science, University of Washington, Seattle WA
Venue:
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Year:
1989

Citing 13
Cited 14

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
Adaptive load sharing in homogeneous distributed systems

IEEE Transactions on Software Engineering
Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
Parallel Processing in Ada

Computer
Fine-grained mobility in the Emerald system

ACM Transactions on Computer Systems (TOCS)
Firefly: A Multiprocessor Workstation

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming

Software—Practice & Experience
An open enviornment for building parallel programming systems

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Experience with processes and monitors in Mesa

Communications of the ACM
Communicating sequential processes

Communications of the ACM
Ethernet: distributed packet switching for local computer networks

Communications of the ACM
Modelling and analysis of distributed software systems

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
A short introduction to Concurrent Euclid

ACM SIGPLAN Notices

The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Computers
Cache considerations for multiprocessor programmers

Communications of the ACM
Snoopy cache test-and-test-and-set without execessive bus contention

ACM SIGARCH Computer Architecture News
Parallel programs and background load: efficiency studies with the PAR-Bench system

ICS '91 Proceedings of the 5th international conference on Supercomputing
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Performance counters and state sharing annotations: a unified approach to thread locality

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Language Portability Across Shared Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
(De-) Clustering Objects for Multiprocessor System Software

IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Flexible, Low-overhead Event Logging to Support Resource Scheduling

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Portability events: a programming model for scalable system infrastructures

Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
How many threads to spawn during program multithreading?

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.02

Visualization

Abstract

Threads (“lightweight” processes) have become a common element of new languages and operating systems. This paper examines the performance implications of several data structure and algorithm alternatives for thread management in shared-memory multiprocessors. Both experimental measurements and analytical model projections are presented.For applications with fine-grained parallelism, small differences in thread management are shown to have significant performance impact, often posing a tradeoff between throughput and latency. Per-processor data structures can be used to improve throughput, and in some circumstances to avoid locking, improving latency as well.The method used by processors to queue for locks is also shown to affect performance significantly. Normal methods of critical resource waiting can substantially degrade performance with moderate numbers of waiting processors. We present an Ethernet-style backoff algorithm that largely eliminates this effect.