Design Tradeoffs for Process Scheduling in Shared Memory Multiprocessor Systems
IEEE Transactions on Software Engineering
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Processor-pool-based scheduling for large-scale NUMA multiprocessors
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Scheduling in parallel systems with a hierarchical organization of tasks
ICS '92 Proceedings of the 6th international conference on Supercomputing
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Journal of Systems Architecture: the EUROMICRO Journal - Special double issue: massively parallel computing systems
Analysis of Contention in Multiprocessor Scheduling
Performance '90 Proceedings of the 14th IFIP WG 7.3 International Symposium on Computer Performance Modelling, Measurement and Evaluation
A Hierarchical Processor Scheduling Policy for Multiprocessor Systems
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
An Efficient Adaptive Scheduling Scheme for Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Cluster Queue Structure for Shared-Memory Multiprocessor Systems
The Journal of Supercomputing
CRQ-based fair scheduling on composable multicore architectures
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 4.10 |
A run queue is a critical data structure that can affect overall performance in shared memory multiprocessor systems. Both of the basic run queue organizations, centralized and distributed, present performance problems. Among the techniques that mitigate these problems, none is completely satisfactory. This article compares uniform-memory-access multiprocessors with nonuniform-memory-access multiprocessors and describes the two basic run queue organizations, citing their main drawbacks. A look at the techniques for improving performance in these basic organizations sets the stage for the introduction of the hierarchical run queue organization. The hierarchical organization inherits the best features of the centralized and the distributed organizations while avoiding their pitfalls. In the hierarchical organization, a set of task queues is organized as a tree, and the processors with their local queues are attached to the bottom level of the tree as leaf nodes. The tree branching factor and the transfer factor, which indicates the number of tasks transferred from a parent queue to a child queue in the hierarchy, are shown to be the key design issues. Average response time of the three organizations as a function of system utilization reveals that the hierarchical organization provides the best performance for all levels of system utilization. The article concludes that this run queue organization will prove to be a useful mechanism for building large-scale shared memory multiprocessor systems.