Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Malleable-Job System for Timeshared Parallel Machines
CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
A Model For Speedup of Parallel Programs
A Model For Speedup of Parallel Programs
ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Composing parallel software efficiently with lithe
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing
Hi-index | 0.00 |
As hardware becomes increasingly parallel and the availability of scalable parallel software improves, the problem of managing multiple multithreaded applications (processes) becomes important. Malleable processes, which can vary the number of threads used as they run, enable sophisticated and flexible resource management. Although many existing applications parallelized for SMPs with parallel runtimes are in fact already malleable, deployed run-time environments provide no interface nor any strategy for intelligently allocating hardware threads or even preventing oversubscription. Work up until SCAF either depends upon profiling applications ahead of time in order to make good decisions about allocations, or does not account for process efficiency at all. This paper presents the Scheduling and Allocation with Feedback (SCAF) system, a drop-in runtime solution which supports existing malleable applications in making intelligent allocation decisions based on observed efficiency without any paradigm change, changes to semantics, program modification, offline profiling, or even recompilation. Our existing implementation can control most unmodified OpenMP applications. Other malleable threading libraries can also easily be supported with small modifications, without requiring application modification. In this work, we present the SCAF daemon and a SCAF-aware port of the GNU OpenMP runtime. We demonstrate that applications running on the SCAF runtime still perform well when executing on a quiescent system. We present a new technique for estimating process efficiency purely at runtime using available hardware counters, and demonstrate its effectiveness in aiding allocation decisions. We evaluated SCAF using NAS NPB parallel benchmarks. When run concurrently pairwise, 70% of benchmark pairs on an 8-core Xeon processor saw improvements averaging 15% in sum of speedups compared to equipartitioning. For a 64-context Sparc T2 processor, 57% of pairs saw a similar 15% improvement. The improvement was 45% vs. equipartitioning when three selected benchmarks were concurrently run.