Scheduler activations: effective kernel support for the user-level management of parallelism
ACM Transactions on Computer Systems (TOCS)
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We present the design, implementation, and performance of a novel approach for effectively utilizing shared-memory multiprocessors in the presence of multiprogramming. Our approach offers high performance by combining the techniques of process control and processor partitioning. The process control technique is based on the principle that to maximize performance, a parallel application must dynamically match the number of runnable processes associated with it to the effective number of processors available to it. This avoids the problems arising from oblivious preemption of processes and it allows an application to work at a better operating point on its speedup versus processors curve. Processor partitioning is necessary for dealing with realistic multiprogramming environments, where both process controlled and non-controlled applications may be present. It also helps improve the cache performance of applications and removes the bottleneck associated with a single centralized scheduler. Preliminary results from an implementation of the process control approach, with a user-level server, were presented in a previous paper. In this paper, we extend the process control approach to work with processor partitioning and fully integrate the approach with the operating system kernel. This also allows us to address a limitation in our earlierEimplementation wherein a close correspondence between runnable processes and the available processors was not maintained in the presence of I/O. The paper presents the design decisions and the rationale for the current implementation, along with extensive results from executions on a high-performance Silicon Graphics 4D/340