The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

Authors:
T. E. Anderson;E. D. Lazowska;H. M. Levy
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1989

Citing 16
Cited 36

Adaptive load sharing in homogeneous distributed systems

IEEE Transactions on Software Engineering
Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
Parallel Processing in Ada

Computer
Fine-grained mobility in the Emerald system

ACM Transactions on Computer Systems (TOCS)
Firefly: A Multiprocessor Workstation

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming

Software—Practice & Experience
An open enviornment for building parallel programming systems

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Workcrews: an abstraction for controlling parallelism

International Journal of Parallel Programming
Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems

IEEE Transactions on Software Engineering
Adaptive backoff synchronization techniques

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The performance implications of thread management alternatives for shared-memory multiprocessors

SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Experience with processes and monitors in Mesa

Communications of the ACM
Communicating sequential processes

Communications of the ACM
Ethernet: distributed packet switching for local computer networks

Communications of the ACM
Modelling and analysis of distributed software systems

SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
A short introduction to Concurrent Euclid

ACM SIGPLAN Notices

The performance of an object-oriented threads package

OOPSLA/ECOOP '90 Proceedings of the European conference on object-oriented programming on Object-oriented programming systems, languages, and applications
Quartz: a tool for tuning parallel program performance

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The interaction of architecture and operating system design

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
User-level interprocess communication for shared memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Scheduler activations: effective kernel support for the user-level management of parallelism

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Using continuations to implement thread management and communication in operating systems

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A customizable substrate for concurrent languages

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Scheduling in parallel systems with a hierarchical organization of tasks

ICS '92 Proceedings of the 6th international conference on Supercomputing
Fast mutual exclusion for uniprocessors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
Chores: enhanced run-time support for shared-memory parallel computing

ACM Transactions on Computer Systems (TOCS)
Recent trends in experimental operating systems research

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
A machine independent interface for lightweight threads

ACM SIGOPS Operating Systems Review
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version)

IEEE/ACM Transactions on Networking (TON)
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Performance measurements for multithreaded programs

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Affinity scheduling of unbalanced workloads

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Reducing Run Queue Contention in Shared Memory Multiprocessors

Computer
Characterizing the Performance of Algorithms for Lock-Free Objects

IEEE Transactions on Computers
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Implementation of Scalable Blocking Locks Using an Adaptive Thread Scheduler

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
PANDA - Supporting Distributed Programming in C++

ECOOP '93 Proceedings of the 7th European Conference on Object-Oriented Programming
Static Analyses for Eliminating Unnecessary Synchronization from Java Programs

SAS '99 Proceedings of the 6th International Symposium on Static Analysis
Flexible Control of Parallelism in a Multiprocessor PC Router

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Scalable synchronous queues

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Nachos instructional operating system

USENIX'93 Proceedings of the USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 Conference Proceedings
Performance issues in parallelized network protocols

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Enabling scalability and performance in a large scale CMP environment

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Scalable synchronous queues

Communications of the ACM - Security in the Browser
Dynamic load balancing in MPI jobs

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
The effectiveness of affinity-based scheduling in multiprocessor networking

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1

Quantified Score

Hi-index	14.99

Visualization

Abstract

An examination is made of the performance implications of several data structure and algorithm alternatives for thread management in shared-memory multiprocessors. Both experimental measurements and analytical model projections are presented. For applications with fine-grained parallelism, small differences in thread management are shown to have significant performance impact, often posing a tradeoff between throughput and latency. Per-processor data structures can be used to to improve throughput, and in some circumstances to avoid locking, improving latency as well. The method used by processors to queue for locks is also shown to affect performance significantly. Normal methods of critical resource waiting can substantially degrade performance with moderate numbers of waiting processors. The authors present an Ethernet-style backoff algorithm that largely eliminates this effect.