A framework for remote dynamic program optimization
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
A Dynamic Periodicity Detector: Application to Speedup Computation
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Comprehensive Dynamic Processor Allocation Scheme for Multiprogrammed Multiprocessor Systems
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Towards a compilation paradigm for computational applications on the information power grid
Computational science, mathematics and software
Runtime Empirical Selection of Loop Schedulers on Hyperthreaded SMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance-Driven Processor Allocation
IEEE Transactions on Parallel and Distributed Systems
Speculative thread decomposition through empirical optimization
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance-driven processor allocation
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Adaptive execution techniques of parallel programs for multiprocessors
Journal of Parallel and Distributed Computing
Adaptively increasing performance and scalability of automatically parallelized programs
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatically tuning parallel and parallelized programs
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
If parallelism can be successfully exploited in a program, significant reductions in execution time can be achieved. However, if sections of the code are dominated by parallel overheads, the overall program performance can degrade. We propose a framework, based on an inspector-executor model, for identifying loops that are dominated by parallel overheads and dynamically serializing these loops. We implement this framework in the Polaris parallelizing compiler and evaluate two portable methods for classifying loops as profitable or unprofitable. We show that for six benchmark programs from the Perfect Club and SPEC 95 suites, parallel program execution times can be improved by as much as 85% on 16 processors of an Origin 2000.