Quantitative system performance: computer system analysis using queueing network models
Quantitative system performance: computer system analysis using queueing network models
Adaptive load sharing in homogeneous distributed systems
IEEE Transactions on Software Engineering
Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Computer
Fine-grained mobility in the Emerald system
ACM Transactions on Computer Systems (TOCS)
Firefly: A Multiprocessor Workstation
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
An open enviornment for building parallel programming systems
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Experience with processes and monitors in Mesa
Communications of the ACM
Communicating sequential processes
Communications of the ACM
Ethernet: distributed packet switching for local computer networks
Communications of the ACM
Modelling and analysis of distributed software systems
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
A short introduction to Concurrent Euclid
ACM SIGPLAN Notices
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Cache considerations for multiprocessor programmers
Communications of the ACM
Snoopy cache test-and-test-and-set without execessive bus contention
ACM SIGARCH Computer Architecture News
Parallel programs and background load: efficiency studies with the PAR-Bench system
ICS '91 Proceedings of the 5th international conference on Supercomputing
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Performance counters and state sharing annotations: a unified approach to thread locality
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Language Portability Across Shared Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
(De-) Clustering Objects for Multiprocessor System Software
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Flexible, Low-overhead Event Logging to Support Resource Scheduling
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Portability events: a programming model for scalable system infrastructures
Proceedings of the 3rd workshop on Programming languages and operating systems: linguistic support for modern operating systems
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Hi-index | 0.02 |
Threads (“lightweight” processes) have become a common element of new languages and operating systems. This paper examines the performance implications of several data structure and algorithm alternatives for thread management in shared-memory multiprocessors. Both experimental measurements and analytical model projections are presented.For applications with fine-grained parallelism, small differences in thread management are shown to have significant performance impact, often posing a tradeoff between throughput and latency. Per-processor data structures can be used to improve throughput, and in some circumstances to avoid locking, improving latency as well.The method used by processors to queue for locks is also shown to affect performance significantly. Normal methods of critical resource waiting can substantially degrade performance with moderate numbers of waiting processors. We present an Ethernet-style backoff algorithm that largely eliminates this effect.