Automatic parallelization for symmetric shared-memory multiprocessors

Authors:
Jyh-Herng Chow;Leonard E. Lyon;Vivek Sarkar
Affiliations:
Application Development Technology Institute, IBM Software Solutions Division;Application Development Technology Institute, IBM Software Solutions Division;Application Development Technology Institute, IBM Software Solutions Division
Venue:
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Year:
1996

Citing 10
Cited 10

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Determining average program execution times and their variance

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Factoring: a practical and robust method for scheduling parallel loops

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
Automatic partitioning of a program dependence graph into parallel tasks

IBM Journal of Research and Development
A general framework for iteration-reordering loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Automated support for legacy code understanding

Communications of the ACM
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing

Optimal weighted loop fusion for parallel programs

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
False Sharing Elimination by Selection of Runtime Scheduling Parameters

ICPP '97 Proceedings of the international Conference on Parallel Processing
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Loop Transformations for Hierarchical Parallelism and Locality

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A compiler for exploiting nested parallelism in OpenMP programs

Parallel Computing - OpenMp
Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification

Proceedings of the 44th annual Design Automation Conference
OpenMP tasks in IBM XL compilers

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Exploring parallelization strategies for NUFFT data translation

EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

The trend in workstation hardware is towards symmetric shared-memory multiprocessors (SMPs). User expectations are for (largely) automatic exploitation of parallelism on an SMP, similar to automatic exploitation of modern processor features such as caches and instruction scheduling.In this paper, we present our solution to automatic SMP parallelization. Our solution is unique in its robust support for unbalanced processor loads and nesting of parallel loops and parallel sections, in conjunction with its tight integration with high-order transformations for improved uniprocessor performance, so that the speedup due to parallelism is truly a multiplicative speedup over highly optimized uniprocessor execution times.