Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiler and runtime support for programming in adaptive parallel environments
Compiler and runtime support for programming in adaptive parallel environments
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Message passing versus distributed shared memory on networks of workstations
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An analysis of gang scheduling for multiprogrammed parallel computing environments
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Dynamically Adapting the Degree of Parallelism with Reflexive Programs
IRREGULAR '96 Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems
On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Performance Evaluation of Gang Scheduling for Parallel and Distributed Multiprogramming
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Using Queue Time Predictions for Processor Allocation
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Design and Evaluation of ParaStation 2
Workshop on Wide Area Networks and High Performance Computing
Atune-IL: An Instrumentation Language for Auto-tuning Parallel Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
A language-based tuning mechanism for task and pipeline parallelism
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Hi-index | 0.00 |
Assigning additional processors to a parallel application may slow it down or lead to poor computer utilization. This paper demonstrates that it is possible for an application to automatically choose its own, optimal degree of parallelism. The technique is based on a simple binary search procedure for finding the optimal number of processors, subject to one of the following criteria: - maximum speed, - maximum benefit-cost ratio, or - maintaining an efficiency threshold The technique has been implemented and evaluated on a Cray T3E with 512 processors using both kernels and real applications from Mathematics, Electrical Engineering, and Geophysics. In all tests, the optimal parallelism is found quickly. The technique can be used to determine the optimal degree of parallelism without manual timing runs. It thus can help shorten application runtime, reduce costs, and lead to better overall utilization of parallel computers.