Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Factoring: a practical and robust method for scheduling parallel loops
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Automated support for legacy code understanding
Communications of the ACM
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
False Sharing Elimination by Selection of Runtime Scheduling Parameters
ICPP '97 Proceedings of the international Conference on Parallel Processing
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A compiler for exploiting nested parallelism in OpenMP programs
Parallel Computing - OpenMp
Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification
Proceedings of the 44th annual Design Automation Conference
OpenMP tasks in IBM XL compilers
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Exploring parallelization strategies for NUFFT data translation
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q
IBM Journal of Research and Development
Hi-index | 0.00 |
The trend in workstation hardware is towards symmetric shared-memory multiprocessors (SMPs). User expectations are for (largely) automatic exploitation of parallelism on an SMP, similar to automatic exploitation of modern processor features such as caches and instruction scheduling.In this paper, we present our solution to automatic SMP parallelization. Our solution is unique in its robust support for unbalanced processor loads and nesting of parallel loops and parallel sections, in conjunction with its tight integration with high-order transformations for improved uniprocessor performance, so that the speedup due to parallelism is truly a multiplicative speedup over highly optimized uniprocessor execution times.