Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Accurate static estimators for program optimization
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiling Fortran 90D/HPF for distributed memory MIMD computers
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A Scalable Scheduling Scheme for Functional Parallelism on Distributed Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Optimal mapping of sequences of data parallel tasks
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Program repartitioning on varying communication cost parallel architectures
Journal of Parallel and Distributed Computing
IEEE Transactions on Parallel and Distributed Systems
Optimal Scheduling Algorithm for Distributed-Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Loop Parallelization
Partitioning and Scheduling Parallel Programs for Multiprocessors
Partitioning and Scheduling Parallel Programs for Multiprocessors
Requirements for Data-Parallel Programming Environments
IEEE Parallel & Distributed Technology: Systems & Technology
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Solving Alignment Using Elementary Linear Algebra
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Effect of variation in compile time costs on scheduling tasks on distributed memory systems
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
(R) A Compile Time Partitioning Method for DOALL Loops on Distributed Memory Systems
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
A Robust Compile Time Method for Scheduling Task Parallelism on Distributed Memory Machines
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
Most of the reported work in the Parallelizing Compilers literature focuses on analyzing program characteristics such as the dependencies, loop structures, memory reference patterns etc. to optimize the generated parallel code [3, 2, 7, 8, 14, 10]. Unfortunately, parallelizing compilers have very little or no knowledge of the actual run time behavior of the synthesized code on the underlying hardware due to the complex behavior of the underlaying hardware and software subsystems. This interaction could significantly affect the performance of the generated code and must be considered during program partitioning phases of the compiler. In this paper, we present an efficient and accurate performance model based program partitioning approach for parallel architectures. We introduce the concept of behavioral edges for capturing the interactions between computation and communication through parametric functions. We present an efficient algorithm to identify behavioral edges, modify costs using the behavioral edges and adapt the schedule to improve schedule length. The program partitioning phase uses the static estimates computed using the behavioral edges and partitioning is iteratively performed using the ordering PDG based on computed intervals. A significant performance improvement (factor of 10 in many cases) is demonstrated by using our framework