Adaptive load sharing in homogeneous distributed systems
IEEE Transactions on Software Engineering
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Computer
Fine-grained mobility in the Emerald system
ACM Transactions on Computer Systems (TOCS)
Firefly: A Multiprocessor Workstation
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
An open enviornment for building parallel programming systems
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems
IEEE Transactions on Software Engineering
Adaptive backoff synchronization techniques
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The performance implications of thread management alternatives for shared-memory multiprocessors
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Experience with processes and monitors in Mesa
Communications of the ACM
Communicating sequential processes
Communications of the ACM
Ethernet: distributed packet switching for local computer networks
Communications of the ACM
Modelling and analysis of distributed software systems
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
A short introduction to Concurrent Euclid
ACM SIGPLAN Notices
The performance of an object-oriented threads package
OOPSLA/ECOOP '90 Proceedings of the European conference on object-oriented programming on Object-oriented programming systems, languages, and applications
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
The interaction of architecture and operating system design
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
User-level interprocess communication for shared memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Scheduler activations: effective kernel support for the user-level management of parallelism
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Using continuations to implement thread management and communication in operating systems
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
A customizable substrate for concurrent languages
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Scheduling in parallel systems with a hierarchical organization of tasks
ICS '92 Proceedings of the 6th international conference on Supercomputing
Fast mutual exclusion for uniprocessors
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Recent trends in experimental operating systems research
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
A machine independent interface for lightweight threads
ACM SIGOPS Operating Systems Review
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE/ACM Transactions on Networking (TON)
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Performance measurements for multithreaded programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Affinity scheduling of unbalanced workloads
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Characterizing the Performance of Algorithms for Lock-Free Objects
IEEE Transactions on Computers
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Implementation of Scalable Blocking Locks Using an Adaptive Thread Scheduler
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
PANDA - Supporting Distributed Programming in C++
ECOOP '93 Proceedings of the 7th European Conference on Object-Oriented Programming
Static Analyses for Eliminating Unnecessary Synchronization from Java Programs
SAS '99 Proceedings of the 6th International Symposium on Static Analysis
Flexible Control of Parallelism in a Multiprocessor PC Router
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The Nachos instructional operating system
USENIX'93 Proceedings of the USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 Conference Proceedings
Performance issues in parallelized network protocols
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Communications of the ACM - Security in the Browser
Dynamic load balancing in MPI jobs
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
The effectiveness of affinity-based scheduling in multiprocessor networking
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 1
Hi-index | 14.99 |
An examination is made of the performance implications of several data structure and algorithm alternatives for thread management in shared-memory multiprocessors. Both experimental measurements and analytical model projections are presented. For applications with fine-grained parallelism, small differences in thread management are shown to have significant performance impact, often posing a tradeoff between throughput and latency. Per-processor data structures can be used to to improve throughput, and in some circumstances to avoid locking, improving latency as well. The method used by processors to queue for locks is also shown to affect performance significantly. Normal methods of critical resource waiting can substantially degrade performance with moderate numbers of waiting processors. The authors present an Ethernet-style backoff algorithm that largely eliminates this effect.